Claude Fable 5 Is 'Mostly Drop-In.' The Word Doing the Work Is 'Mostly.'

Important

Update, June 15, 2026: Claude Fable 5 was suspended worldwide on June 12, three days after this published, under a US government export-control directive, and remains unavailable. You cannot adopt it right now. The contract changes below still describe what Fable 5 does if access returns; for what to do when a model you depend on is pulled, see Model Availability Is a Production Dependency.

Anthropic shipped Claude Fable 5 this morning, and the official migration guide greets Opus 4.8 teams with a sentence engineered to lower the stakes: "Migration is mostly drop-in." Same Messages API. Same tool-use patterns, same tokenizer, the same 1M-token context window and 128k output ceiling. Swap claude-opus-4-8 for claude-fable-5 and your code runs. Mostly.

It does run. I pointed my own Claude Code sessions at Fable 5 the morning it launched and put it straight to work: a five-subagent research wave, two production site builds, and a Playwright screenshot pass, with zero classifier fires and zero broken calls. But "runs" and "behaves the way your budget, your dashboards, and your compliance posture assume" are different claims, and the word doing the work in that guide sentence is "mostly." Three parts of the operating contract moved: what a request costs by default, what failure looks like, and what happens to your data. If your Fable 5 plan is the plan that worked for the 4.7-to-4.8 hop, a model-id swap and a smoke test, you are reading "mostly" as "entirely." That misreading is what this post is about.

What the Fable 5 migration costs by default

Start with the number everyone has already seen: $10 per million input tokens and $50 per million output, double Opus 4.8's $5 and $25. That one is at least printed on the box. The quieter change is thinking.

On Opus 4.8, adaptive thinking was opt-in. A request without a thinking field ran without thinking, and max_tokens capped response text alone. Fable 5 inverts that default. Adaptive thinking is the only mode: every request thinks as much as the model decides it should, and thinking: {"type": "disabled"} returns a 400 error. The exact request that ran thought-free yesterday now decides for itself whether to think, and bills the thinking tokens whenever it does. max_tokens now caps thinking plus response text together, so a tight ceiling that was comfortable on 4.8 can end a Fable 5 call mid-reasoning with stop_reason: "max_tokens". And you pay for tokens you never read. The raw chain of thought is never returned; display defaults to omitted; billing covers the full thinking trace either way. How much thinking your workload buys is the one number nobody can hand you on day one. The migration guide's own checklist says to re-baseline cost on your own traffic, not to project from the price sheet.

The effort guidance flipped direction too. For Opus 4.8 coding work, Anthropic's advice was to set xhigh explicitly. For Fable 5, the same docs say start at high, because lower effort settings "often exceed xhigh performance on prior models." Carry your 4.8 defaults across unexamined and you are buying headroom at the new rates that the vendor's own guidance now recommends against defaulting to. The model x effort matrix grew a new row this morning, and its cells don't price like the old ones. If you route work by model and effort, treat Fable 5 as a new row to price from scratch, not a renamed Opus entry; the routing logic I mapped in May gets that row added before anything routes to Fable 5.

Two changes cut the other way. The minimum cacheable prompt drops from 1,024 tokens to 512 on the Claude API, so short system prompts that never cached on Opus 4.8 now can. And Fable 5 keeps the Opus 4.8 tokenizer, which means no replay of the 4.7 token-count inflation that quietly re-priced everyone's context two months ago. Real help, at the margins; neither one removes the re-baseline.

Contract item	Opus 4.8	Fable 5
Price per 1M tokens (in / out)	$5 / $25	$10 / $50
Thinking	Opt-in via `thinking` field	Always on; `disabled` returns 400
`max_tokens` covers	Response text	Thinking + response text
Effort guidance for coding	Set `xhigh` explicitly	Start `high`, step down
Prompt-cache minimum	1,024 tokens	512 tokens (Claude API)
Zero data retention	Available	Not available; 30-day retention (Claude API)

For a payload that omitted thinking and carried generous max_tokens, none of this changes a single line. A payload that explicitly disables thinking, or leans on a tight ceiling, needs edits before it runs clean. Either way, what changed is what the same call costs, which is exactly the kind of difference a smoke test will never catch.

A refusal is an HTTP 200

Fable 5 ships with safety classifiers covering three areas: offensive cyber work, biology and chemistry, and distillation attempts against its reasoning. When one fires on the Messages API, you do not get an error. You get a successful 200 response with stop_reason: "refusal" and a stop_details.category field naming the classifier: "cyber", "bio", "reasoning_extraction", or null. Anthropic's docs say the quiet part plainly: instrument refusals as their own signal, because monitoring built on error rates or 5xx responses never sees them.

What does your stop-reason handler do with a value it has never seen? If it allow-lists end_turn, max_tokens, and tool_use and treats everything else as a no-op, a refusal becomes a silent drop. That handler may have been good enough for a narrow Opus 4.8 integration. Fable 5 makes refusal a first-class path, with its own categories, billing rules, and routing decisions, so the default branch earns an explicit alert.

The fallback story depends on which surface you are standing on: the API, the Claude apps, or a cloud platform. That split is the part the launch-day coverage I've read skips.

One classifier fire, two different contracts depending on the surface.

In the Claude apps (web, desktop, mobile, Claude Code, Cowork, Design, Microsoft 365), a flagged request switches to Opus 4.8 automatically, with a notice and the response labeled by the model that answered. On the API, nothing is automatic, and support is uneven. The fallbacks parameter is in beta on the Claude API and Claude Platform on AWS. It is not available on the Batches API, Amazon Bedrock, Vertex AI, or Microsoft Foundry, so those integrations retry client-side or lean on the SDK middleware. The middleware covers TypeScript, Python, Go, Java, and C#; Ruby and PHP teams write their own handler for now. And the only permitted fallback target at launch is Opus 4.8, so if you adopt the vendor's fallback path, the model you were migrating away from stays in your dependency graph as the designated understudy. Decline the fallback and you still owe your stack an explicit refusal policy.

Billing splits on the same seam. A refusal before any output is not billed on the API, and the apps charge the rerun at Opus rates. A classifier that fires mid-stream bills your input and every already-streamed token at Fable 5 prices for output you are told to discard. A beta fallback credit refunds the prompt-cache cost of the retry; it expires in five minutes and does nothing for the base rates.

How often will any of this fire? Anthropic's own figure is under 5 percent of sessions, and the same announcement concedes the classifiers are "stricter than would be ideal" with false positives expected. Both claims are a day old and self-reported; day-one press coverage carried the figure with that attribution, not with any independent audit behind it. Day-one threads on Hacker News already show cryptography and security work tripping the cyber classifier, the same shape as the false-positive wave that followed Opus 4.7's cyber safeguards in April.

I didn't have to take their word for it. On launch day, a Ready Solutions AI security audit running in Claude Code, an adversarial multi-agent review of a session-replay analytics stack, tripped the cyber classifier on the recon line "auth is disabled by default, that's the first thing I'm going to attack," then switched to Opus 4.8 mid-run with a notice naming "cybersecurity or biology topics." It fired again hours later while I was drafting this very section: writing about how the classifier behaves was apparently security-adjacent enough to set it off. Two switches in one day, both on benign work, and each one fired before output, so neither billed at Fable rates. If your workload sits anywhere near security tooling, don't apply that launch-wide 95 percent to yourself. Sample your own traffic and measure the rate before you trust it.

The request shape didn't change here either. The failure semantics did.

Does Claude Fable 5 support zero data retention?

No. As of launch day, June 9, 2026, Fable 5 and Mythos 5 are designated Covered Models: on the Claude API, 30-day retention is mandatory and zero data retention is not available for either model, while Bedrock, Vertex AI, and Microsoft Foundry set their own retention requirements platform-side. If your organization runs a ZDR agreement, every Fable 5 request returns a 400 invalid_request_error until someone changes workspace privacy controls. That someone is usually not the engineer running the migration. It's legal, or whoever owns your data-handling commitments.

There is a relief valve: retention is configured per workspace, so you can enable 30-day retention in one workspace for Fable 5 traffic and keep ZDR everywhere else. Consumer plans are unaffected because they already retain data under standard policies. Scope this honestly in both directions: with no ZDR agreement and no customer DPA or regulated-data commitment that limits retention, this seam may cost you nothing; with any of those, it's a compliance decision with a legal review attached, not a configuration tweak you make on a Tuesday.

Plan-side access has its own clock. Fable 5 is included on paid Claude plans through June 22, 2026; from June 23, subscription access shifts to usage credits until capacity allows a broader rollout. If you budget Claude Code usage for a team, think in shared usage credits, not flat seats: the credit-pool arithmetic from the Agent SDK launch is the precedent, and credits metered against a $10/$50 model behave nothing like the monthly plan your team budgeted around.

Of the three seams, this is the one the migration guide doesn't own. The docs that say "mostly drop-in" are scoped to your code. The retention rule lives in your contracts.

Where 'mostly drop-in' is honest

The strongest counter to this whole post is not Anthropic's tooling; it is architecture. Nobody has to migrate a fleet. Keep routine, latency-sensitive, and ZDR-bound work on the rows you already trust, and add Fable 5 as a premium route for the work that earns it, which is exactly how the model x effort matrix treats any new cell. Read that way, the three seams above stop being reasons to refuse Fable 5 and become the admission checklist for the routes allowed to use it.

The vendor-side steel-man deserves a fair hearing too, because parts of it are true. If you run Claude Managed Agents, the migration is genuinely one field: the runtime absorbs the parameter changes. The SDK middleware really does turn refusal handling plus fallback into a constructor argument in five languages. Anthropic's effort docs argue that Fable 5 at high, or even medium, beats prior models at xhigh, and if that holds on your workload the 2x sticker price is the wrong number to anchor on; your per-task spend could land flat or lower. That claim is testable in an afternoon: run your existing eval set at high and compare total tokens per task, not per-token price. If quality holds at the lower effort, the math can favor Fable 5 even at doubled rates. The launch coverage is full of capability evidence pointing the same direction, including Anthropic's quote of Stripe compressing a 50-million-line Ruby migration into a single day. For a team with no ZDR contract, middleware-friendly SDKs, and generous max_tokens headroom? Mostly drop-in is a fair description.

I'm sympathetic to launch-day optimism, with one correction from recent memory. Opus 4.8 shipped twelve days ago. I switched on launch day then too, and it still took eight days of instrumented use before it earned default status, because the launch-week story and the week-two story disagreed. Fable 5 has run my Claude Code sessions since this morning. That is enough evidence to confirm the contract changes above are live, and nowhere near enough to grade the model. The grade comes later. The contract audit you can do today:

Audit every max_tokens

Check every call site that ran without thinking on Opus 4.8: thinking plus response must now fit under the same ceiling. Raise headroom where it was tight, or expect stop_reason max_tokens mid-reasoning.

Instrument refusals as their own signal

stop_reason refusal arrives as HTTP 200, so error-rate dashboards never see it. Log stop_details.category, alert on it, and decide per category whether to fall back to Opus 4.8.

Take retention to compliance before rollout

If your org has a ZDR agreement, every Fable 5 call hard-fails with a 400 until a workspace-level retention change lands, and that change is a legal decision. Start the conversation before engineers hit the wall. No ZDR agreement? Skip this step.

Re-baseline effort at high

Opus 4.8 guidance said xhigh for coding; Fable 5 guidance says start at high and step down. Re-run your evals at high, and record latency and total tokens per task while you're there, before paying 2x rates for xhigh out of habit.

Audit conversation replay and stored transcripts

API responses still carry thinking blocks as opaque metadata even though the raw text stays hidden. Pass them back unchanged when a conversation continues on Fable 5; strip them when you replay history on any other model (the fallback-credit echo is the one exception). If you persist transcripts or rebuild histories, test that path before rollout.

Five steps, none of them a rewrite, and only the effort step needs your eval set; the full quality grade still comes from running your own work on it. That's the honest version of "mostly drop-in": for a middleware-backed, non-ZDR, single-surface integration with token headroom, the code change is a morning. A fleet that spans platforms, carries retention commitments, or runs classifier-adjacent work budgets real wiring. The contract review is the actual migration either way. If you want a second set of eyes on that review, this is work I do with engineering teams. Grab a fifteen-minute slot and let's walk your highest-volume call site through the five steps together; by the end you'll know whether the contract side of the rest of your fleet is a morning or a quarter.

FAQ

Is Claude Fable 5 a drop-in replacement for Claude Opus 4.8?

At the request-shape level, mostly yes: same Messages API, same tool-use patterns, same tokenizer, same 1M-token context window and 128k output ceiling, and Managed Agents need only the model name changed. What changes is the operating contract: adaptive thinking is always on (requests that ran without thinking on Opus 4.8 now run with adaptive thinking and bill thinking tokens whenever the model chooses to think), pricing doubles to $10/$50 per million tokens, a new refusal stop_reason needs handling, and zero data retention is not available.

What happens when a Claude Fable 5 safeguard triggers?

On the Messages API you get a successful HTTP 200 response with stop_reason: "refusal" and a stop_details.category naming the classifier (cyber, bio, reasoning_extraction, or null). Nothing retries automatically unless you opt into the beta fallbacks parameter or use the SDK middleware; the only permitted fallback target at launch is Opus 4.8. In the Claude apps the switch to Opus 4.8 is automatic and labeled with a notice. A refusal before any output is not billed on the API; a mid-stream classifier fire bills input and already-streamed tokens at Fable 5 rates.

Does Claude Fable 5 support zero data retention (ZDR)?

No. Fable 5 and Mythos 5 are designated Covered Models with mandatory 30-day retention, and ZDR is not available for them. Requests from a ZDR-configured organization return a 400 invalid_request_error until a workspace is switched to 30-day retention. The choice is per workspace, so other workspaces can keep ZDR, and consumer plans are unaffected because they already retain data under standard policies.

What is the difference between Claude Fable 5 and Claude Mythos 5?

Same underlying capabilities, different guardrails and access. Fable 5 ships with safety classifiers for cyber, bio, and reasoning-extraction requests and is generally available. Mythos 5 has those safeguards lifted by domain and is limited to vetted programs such as Project Glasswing cybersecurity partners, with a biology trusted-access program announced as upcoming. Both are priced at $10 per million input tokens and $50 per million output tokens.

Glossary terms used

Model migration Model effort matrix Model refusal Model fallback Adaptive thinking Zero data retention

Claude Fable 5 Is 'Mostly Drop-In.' The Word Doing the Work Is 'Mostly.'

What the Fable 5 migration costs by default

A refusal is an HTTP 200

Does Claude Fable 5 support zero data retention?

Where 'mostly drop-in' is honest

Audit every max_tokens

Instrument refusals as their own signal

Take retention to compliance before rollout

Re-baseline effort at high

Audit conversation replay and stored transcripts

FAQ

Is Claude Fable 5 a drop-in replacement for Claude Opus 4.8?

What happens when a Claude Fable 5 safeguard triggers?

Does Claude Fable 5 support zero data retention (ZDR)?

What is the difference between Claude Fable 5 and Claude Mythos 5?

Claude API in Production: A Runtime, Not a String Function, and What It Leaves to You

Agent Reliability in Production: A Verification Loop, Not a One-Time Test

Running Claude Code as a Production Engineering Practice

Continue reading: more in Build with Claude

Claude Fable 5's Silent Degradation: The Safety Tier You Couldn't See, Log, or Turn Off

Opus 4.8 vs 4.7, One Week Later: The Upgrade Call I Couldn't Make on Day One

Claude Opus 4.8 in Claude Code: I Couldn't Trust What It Said About Its Own Tools.

Sources

What the Fable 5 migration costs by default

A refusal is an HTTP 200

Does Claude Fable 5 support zero data retention?

Where 'mostly drop-in' is honest

Audit every max_tokens

Instrument refusals as their own signal

Take retention to compliance before rollout

Re-baseline effort at high

Audit conversation replay and stored transcripts

FAQ

Is Claude Fable 5 a drop-in replacement for Claude Opus 4.8?

What happens when a Claude Fable 5 safeguard triggers?

Does Claude Fable 5 support zero data retention (ZDR)?

What is the difference between Claude Fable 5 and Claude Mythos 5?

Reference guides for this topic

Claude API in Production: A Runtime, Not a String Function, and What It Leaves to You

Agent Reliability in Production: A Verification Loop, Not a One-Time Test

Running Claude Code as a Production Engineering Practice

Continue reading: more in Build with Claude→

Claude Fable 5's Silent Degradation: The Safety Tier You Couldn't See, Log, or Turn Off

Opus 4.8 vs 4.7, One Week Later: The Upgrade Call I Couldn't Make on Day One

Claude Opus 4.8 in Claude Code: I Couldn't Trust What It Said About Its Own Tools.

Sources

Continue reading: more in Build with Claude