Vol. 1 · Edition 024Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

Previous

Opus 4.8

Now

Fable 5
June 9, 2026 · Anthropic
Model Launch
By Sam Taylor with Samwise

On SWE-bench Verified jumping to 95%, the safety-rerouting architecture that replaces refusals with silent fallbacks, and what it means that this landed five days after the recursive self-improvement paper.

Anthropic ships Mythos to everyone. At $10/M, the price is the argument.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

00

← Anti-AI · Pro-AI →

Anthropic released Claude Fable 5 on June 9. It is the first publicly available Mythos-class model — the tier that, until now, only existed for Project Glasswing partners working on cyberdefense and critical infrastructure. Starting yesterday, it is available to anyone with an API key, on all the usual clouds.

The pricing is $10 per million input tokens and $50 per million output. That is 2× Opus 4.8 standard, and also exactly what Opus 4.8 Fast Mode costs. It is less than half what Mythos Preview cost when it shipped earlier this year. The price signal is the part I want to sit with, because it says something specific: Anthropic is treating Fable 5 as the new performance-tier standard, not as a premium line item. The frontier just got cheaper.

Five days before this launch, Anthropic's Institute published the recursive self-improvement paper calling for a globally coordinated pause option and reporting that Claude now writes more than 80% of its own merged production code. I covered that piece separately. The timing here is worth flagging anyway. A lab that ships its most capable-ever public model five days after publishing research calling for a conditional pause option is not being incoherent — it is making a specific argument: progress continues, and governance architecture is the question, not capability architecture. Whether you find that argument convincing depends on how much you trust Anthropic to actually build the governance side. I don't have a confident answer there. But the argument is a real one, and it deserves engagement rather than a headline about irony.

Anyways. What is Fable 5, and should you upgrade?

What the benchmarks actually say

SWE-bench Verified — the canonical real-world software-engineering benchmark, methodology published — scores Fable 5 at 95.0%. Opus 4.8 is at 88.6%. That 6.4-point gain at the top of the capability curve is not cosmetic. At 88%+, each additional point on SWE-bench Verified corresponds to increasingly difficult, edge-case-dense tasks — the kind that fail regardless of how you tune the prompt.

SWE-bench Pro — Scale AI's harder variant, less susceptible to training-data leakage — shows a larger gap. Fable 5: 80.3%. Opus 4.8: 69.2%. GPT-5.5: 58.6%. The 11-point gap over Opus 4.8 on the harder benchmark is the number I'd weight most when deciding whether this is a real step or a saturated-benchmark artifact. It is real.

95%
Fable 5 on SWE-bench Verified — up from 88.6% for Opus 4.8

→ Source: BenchLM.ai

A caveat I'll say explicitly because it matters: the SWE-bench Verified numbers are from third-party aggregators citing Anthropic's announcement page, which I cannot independently verify against the raw leaderboard data today. The methodology is public. The numbers are consistent across sources. But independent reproduction studies don't exist yet for Fable 5's full benchmark suite — and that matters for anyone deciding to stake production systems on these claims.

The safety architecture is the part most coverage is underweighting

Fable 5 and Mythos 5 are the same base model. That sentence is the whole story. The difference is what happens with a specific slice of queries.

When Fable 5's classifiers detect a request in cybersecurity, biology/chemistry, or model-distillation territory, the request is silently rerouted to Claude Opus 4.8. Not refused. Not flagged. Answered — by a different model. This triggers in less than 5% of sessions. For the other 95%+, Fable 5 performs identically to Mythos 5.

Mythos 5 lifts those classifiers for vetted Project Glasswing partners. Cyberdefense organizations and critical infrastructure teams. Not generally available.

The rerouting design is worth thinking about carefully. "Silent rerouting to a safer model" is different from "refusal" in ways that matter.

Better UX: no friction, no failed request, no error message.

But also: you cannot detect the rerouting from outside the system without knowing the architecture. If your application uses Fable 5 and logs model outputs for compliance, reproducibility, or auditing purposes, you need to know whether your API response headers include model-identity information. Because you might be logging Opus 4.8 outputs under the claude-fable-5 request. That is not a hypothetical gotcha — it is an operational question with real compliance implications in regulated industries.

I'm not calling this a bad design. It's a reasonable design. I am saying: understand it before you deploy in contexts where model provenance matters.

Fable 5 vs Opus 4.8 vs GPT-5.5
MetricClaude Fable 5Claude Opus 4.8GPT-5.5
SWE-bench Verified95.0%88.6%
SWE-bench Pro80.3%69.2%58.6%
Input price (standard)$10/M$5/M$5/M
Output price (standard)$50/M$25/M$20/M
Agentic session lengthDaysHoursHours
Safety rerouting< 5% of sessionsNoneNone
GitHub CopilotGA June 9YesYes

Source spread

Pros & cons

What's real:

  • The SWE-bench Verified gain from 88.6% to 95.0% is meaningful. At that capability level, a 6-point delta shows up as fewer failed agent sessions and lower end-to-end cost per completed task.
  • $10/$50 pricing positions Fable 5 as the new performance-tier standard, not a premium. The frontier compressed by half in roughly six months. That is the trend line worth tracking.
  • The June 9–22 free window on Pro, Max, Team, and Enterprise makes evaluation essentially zero-cost. That is the right move for a launch this significant.
  • GitHub Copilot GA on launch day means builders who run on Microsoft's toolchain don't need to wait for API access to evaluate.
  • The silent rerouting design means the vast majority of production workflows — anything outside cybersecurity/bio-chem/distillation — will run on the full Mythos-grade model without any modification.

What deserves a side-eye:

  • The rerouting architecture raises a model-provenance question that Anthropic hasn't addressed publicly: can callers identify which model actually generated a given response? For regulated industries and audit trails, this matters.
  • Independent reproductions of the benchmark suite don't exist yet. SWE-bench Verified methodology is public; other numbers are first-party or aggregated third-party claims.
  • "Works for days in an agent harness" is real in principle. In practice, a multi-day Fable 5 session at $10/$50 per million tokens can be expensive. The billing math is not trivial and Anthropic doesn't show a cost estimate before you start a long-running run.

When run in an agent harness, Claude Fable 5 can work for days at a time: planning across stages, delegating to sub-agents, and checking its own work.

Anthropic — Claude Fable 5 and Mythos 5 announcement

What builders need to know

  • Model ID is claude-fable-5. Available now on the Claude Platform, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.
  • Free through June 22 on Pro, Max, Team, and seat-based Enterprise. After June 23, usage credits required. Evaluate now — this is a real window.
  • Run your existing prompt suite against Fable 5 before flipping production traffic. Capability jumps change model behavior at the edges. Any prompt that relies on specific refusal patterns or safety responses needs re-evaluation given the rerouting architecture.
  • In compliance-sensitive contexts: check whether the API response headers identify which model actually generated a response. Silent rerouting means claude-fable-5 calls may occasionally produce Opus 4.8 outputs. Log accordingly.
  • For multi-step agentic runs: set token-consumption alerts before starting. "$50 per million output tokens × a days-long run" is a billing scenario worth planning for, not discovering after.
  • The 90% prompt-caching discount still applies on Fable 5 input tokens, same as Opus 4.8. Factor that in if you're comparing effective costs for cached-prompt workflows.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.