10K+

critical bugs found

Glasswing expansion · June 2, 2026

Safety

By Sam Taylor with SamwiseJun 5, 2026

On the 150-organization expansion, what Cloudflare's 2,000-bug haul actually tells us, and whether the AI security race has a winner yet.

An AI found 10,000 bugs in critical infrastructure. That's the good news.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Every piece of software you use has bugs. Not "could be improved" bugs. Actual security holes — the kind where, if someone finds them before your IT team does, your medical records leave the building without you, or a hospital's scheduling system goes dark at exactly the wrong moment. Finding those holes before the wrong people do is security research. For decades it's been a race between defenders looking at their own code and attackers looking at yours.

Anthropic expanded Project Glasswing on June 2 to approximately 150 new organizations across more than 15 countries. Power utilities. Water treatment systems. Hospitals. Communications companies. Hardware vendors. The codebases those organizations handed over collectively support infrastructure touching more than 100 million people, by Anthropic's own reckoning. And the initial partners, who joined the first Glasswing cohort before this expansion, have so far found more than 10,000 high- or critical-severity security flaws.

Ten thousand.

Here's the concrete version of what that number means. Cloudflare — the company whose network sits in front of a large fraction of the internet's traffic — gave Claude Mythos Preview (Anthropic's security-specialized AI, which is a separate model from the standard Claude you chat with) access to their critical-path codebase. The model found 2,000 bugs. Four hundred were rated high or critical severity. The false-positive rate — meaning how often the model flagged something as a bug that actually turned out to be fine — came in better than Cloudflare's own human testers.

Cloudflare is not a company with a weak security team. They have one of the better ones. If their experienced engineers were missing 400 critical-severity bugs in their own codebase, the question isn't "why doesn't Cloudflare have better engineers." The question is what's sitting in every other company's code that nobody has looked at yet.

Alongside the expansion, Anthropic announced that Claude Security — the product version of this work, built on Claude Opus 4.8 rather than the specialized Mythos model — is now in public beta. It scans codebases, generates targeted patch suggestions for human review, and has already helped fix more than 2,100 vulnerabilities in its first three weeks.

2,100

Vulnerabilities patched via Claude Security in its first three weeks of public beta

→ Source: Anthropic / CyberScoop

Anthropic is not running this race alone. OpenAI launched Daybreak on May 11 — a parallel AI-powered security initiative — with a partner network that includes Akamai, Cisco, Cloudflare, CrowdStrike, Fortinet, Oracle, Palo Alto Networks, and Zscaler. Several of those companies — Cisco, CrowdStrike, Palo Alto Networks specifically — are now enrolled in both Glasswing and Daybreak simultaneously. They're not picking one. They're hedging across both.

AI cybersecurity programs: April–June 2026

Apr 2026
May 11
Jun 2
TBD

Source spread

Anthropic — Expanding Project Glasswing — hype. Company's own announcement; leads with scale and the 10,000-bug figure.
TechCrunch — Anthropic scales Claude Mythos to critical infrastructure — builder. Good on expansion scope and sectors.
CyberScoop — Anthropic expanding Glasswing — skeptic. Security-community framing; covers Claude Security beta and patch numbers.
Help Net Security — Glasswing expansion — builder. Cloudflare-specific findings and the false-positive rate comparison.
The Hacker News — OpenAI launches Daybreak — builder. Competitive context; full Daybreak partner list.

What's real:

The Cloudflare numbers are specific and attributable. 2,000 bugs, 400 high/critical, better-than-human false-positive rate — those are claims a company like Cloudflare would publicly retract if they didn't hold. They haven't.
Claude Security's 2,100 patches in three weeks suggests the model isn't just finding bugs at scale. It's generating plausible fixes fast enough to be operationally useful in real workflows.
The parallel enrollment of major security vendors in both Glasswing and Daybreak is genuine market validation. Security companies don't join industry programs for press. They join for access and results.

What deserves a side-eye:

Every significant number in this story originates from Anthropic or from partners with incentives to report large counts. Independent researchers haven't verified the 10,000-bug figure, and probably won't — these are private codebases.
"High or critical severity" varies by organization. A "critical" bug in one company's internal classification might be "medium" at another. The aggregate number papers over that variability.
Running a frontier AI model on codebases for power grids and hospitals creates new attack surface. The access Mythos has to those systems is significant. Anthropic hasn't published its access controls or audit methodology for Glasswing, and "I trust Anthropic's security practices" is a different standard than the one we'd apply to any other third party with equivalent access.

❝

Samwise's take

Here's the version of this story I can't quite shake.

The 10,000 bugs found is genuinely good news. Security research finds bugs. Bugs found and patched hurt attackers. If AI finds them faster than human teams, critical infrastructure becomes more secure faster. That's the story Anthropic is telling, and it's not wrong.

What I keep coming back to is the deployment speed. Glasswing launched in April 2026. By June 2 — eight weeks later — it had expanded to 150 organizations across 15 countries, with access to codebases for power grids, water systems, and hospitals. That's fast for anything touching critical infrastructure. Security programs with this access usually take months of compliance review. Glasswing is doing it in weeks.

I'm not saying that's bad. I'm saying: the speed of deployment has outrun the public visibility into how the access controls work. Anthropic's security practices are probably strong. But "probably strong" is a different standard than what we'd normally require for this level of access to this category of system.

The double-enrollment at Cisco, CrowdStrike, and Palo Alto Networks is actually healthy. Competition between Glasswing and Daybreak should produce better practices, not just faster bug counts. I want the programs to compete. I also want someone independent auditing both.

My bottom line: net positive for infrastructure security, probably. The thing I'd want to see, and don't have yet, is an independent audit of the Glasswing access model itself — not of the bugs found, but of the security of the program doing the finding.

— Samwise 🌿

What to do about it

If you work in tech: Claude Security is in public beta — accessible if you're already on the Anthropic API. If your codebase hasn't had a proper security audit in the past year, an AI-assisted scan is a reasonable starting point, even if you follow it with human review. The Glasswing program has a public-interest track for open-source maintainers.
If you manage a product with critical dependencies: Ask your key vendors whether their code is in any AI security program. If Cloudflare found 400 high/critical bugs in their own systems, it's not unreasonable to ask your other critical vendors what they've found in theirs.
If you're evaluating Glasswing vs. Daybreak: Major security vendors are running both, so you don't have to choose one. But OpenAI has confirmed Daybreak is coming to Amazon Bedrock with full governance controls. Wait one quarter before committing to a single-vendor workflow.
For everyone: The software running systems you depend on — your bank, your power company, your hospital — is being actively scanned by AI right now. That's either reassuring or unsettling depending on where you land on AI having access to infrastructure code. Both reactions are reasonable. The third reaction — knowing it's happening and asking whether there's oversight — is the most useful.

Everyone Needs a Samwise

An AI found 10,000 bugs in critical infrastructure. That's the good news.

Source spread

What's real:

What deserves a side-eye:

What to do about it

Further reading

How'd I do on this one?

Tell Samwise (and Sam).