On Claude Fable 5's launch day I watched a safety classifier fire twice on my own work. The first was a Claude Code security audit of a session-replay analytics stack. A benign recon line tripped the cyber classifier, and the session switched itself to Opus 4.8 with a notice. The second fired while I was writing about the first: drafting an analysis of how the classifier behaves was apparently security-adjacent enough to set it off. Neither request was harmful. Neither produced an error. Both came back as perfectly successful responses that happened to be answered by a different model than the one I asked for.
That pair of observations is this guide's whole subject, because it captures the two facts about refusals that are easiest to underestimate mid-migration. First: a refusal is not an error. On the Messages API it arrives as an HTTP 200 with a new stop reason, which means every alerting pipeline keyed on status codes and exception rates is structurally blind to it. Second: fallback, the thing that rescues the request, is not one mechanism. It is a different contract on each surface you run, with its own billing seams, its own observability traps, and a set of policy questions that no SDK default can answer for you. The migration guide lists refusal handling as one line item in the Opus 4.8 to Fable 5 move; I covered that contract audit in the launch-day decode. This guide is the deep version of that one line item, written for the team that has to run it in production rather than read about it: the anatomy of the refusal response, the decision between the three retry paths, the instrumentation that stays truthful when a fallback serves the turn, and the judgment calls that remain after the wiring is done. It is a model-migration artifact in one sense, but the surface it documents outlives this migration: refusals are now a first-class stop reason in the API, and they aren't going away.
What is stop_reason: refusal, and why is it not an error?
Start with the response shape, because the shape is what your code branches on. Anthropic's reference page documents that shape completely; what it can't tell you is which of its properties will bite your integration, so that's the lens here. When a Fable 5 classifier declines a request on the Claude API, you receive a normal message object: HTTP 200, a usage block, and stop_reason: "refusal". A stop_details object explains the decline. Its category field names the policy area: "cyber" for requests that could enable offensive security harm, "bio" for biology and chemistry risk, "reasoning_extraction" for attempts to pull the model's internal reasoning into response text. The category can also be null, and the docs are explicit that null is a normal, permanent value rather than a placeholder waiting to be filled in. The explanation field is human-readable text whose wording is not stable, so display it if you like, but never parse it.
Three properties of this shape deserve more attention than they get.
The refusal can land mid-stream. A classifier can fire before any output, leaving content empty, or partway through a streamed response, after your user has watched half an answer render. When nothing rescues the request, the guidance is the same in both cases: treat partial output as incomplete and discard it. The exception is a streaming fallback, where the fallback model continues from the partial text and what already rendered stays valid. The billing splits on exactly that seam, and the split matters for cost accounting: a refusal before any output is not billed and does not count against rate limits, while a mid-stream refusal bills the input tokens and everything already streamed, at Fable 5 rates, for output you are told to throw away.
The shape is shared by two different events. The billing cookbook draws a distinction the response itself only hints at: two different events produce stop_reason: "refusal". One is a safety-classifier block. The other is a model refusal, the model itself declining for its own policy reasons. The stop_details.category field is what tells you which one you got. If your refusal dashboard doesn't segment by category, you will conflate a tunable vendor classifier with the model's own judgment, and those two trend lines mean different things. Give the null category its own bucket when you segment; it is a documented value, not a logging bug.
And the value joins an enumeration your code probably allow-lists. I wrote about the allow-list stop-reason anti-pattern before Fable 5 existed: handlers that recognize end_turn, max_tokens, and tool_use, and treat everything else as a no-op. The stop-reason enumeration now includes a value that means "no model answered this request." A no-op branch turns that into a silent drop. The docs add one more sharp edge for batch users: in the Message Batches API a refused item comes back as a succeeded result with stop_reason: "refusal", and stop_details may be null on batch results, so the detection has to branch on the stop reason directly, not on the details object. One concrete rewrite, this week if possible: make the default branch of your stop-reason handler raise an alert on production paths instead of shrugging, and make the refusal branch a first-class path with its own logging, its own category tag, and its own decision about what the user sees.
Why your error dashboards never see a refusal
Anthropic's own integration guidance says the quiet part directly:
A refusal is an HTTP 200, so monitoring built on error rates or 5xx responses never sees it.
Sit with what that means operationally. Every transport-level reliability signal a production API team already has (status-code ratios, exception counts, retry storms, dead-letter queues) reports a healthy system while a classifier quietly declines some slice of your traffic. The failure mode is not noise; it is silence. This is the same class of problem I catalog in agent reliability in production: the failures that matter most are the ones your existing instruments were never pointed at.
The fix is to make refusals their own first-class signal in your observability stack, and the docs spell out the minimal version: emit one event per refusal and one event per fallback-served response, then alert on the gap between the two counts. If your policy retries every refusal synchronously on the same path, that gap approximates the requests that died with no model answering; once you adopt per-category policies, log a terminal outcome per turn (served, fallback-served, surfaced, blocked) rather than trusting the subtraction. Segment both events by stop_details.category and you get the breakdown that turns an abstract rate into a diagnosis: a cyber spike on a security-tooling workload is a false-positive problem to measure and route around, while a reasoning_extraction floor that never goes to zero usually traces back to your own prompts, the input you control, which the next sections cover.
The subtler trap is attribution. Once any fallback mechanism is live, the model you requested and the model that answered are different facts, and the response's top-level model field reports the serving model. The cookbook's warning deserves to be framed on the wall of every team running mixed fleets: analytics recorded against the requested model will be wrong whenever a fallback is used. The truthful per-turn record is the usage.iterations array, which logs every attempt: the model that declined appears as a message entry with its token counts, and the model that served appears as a fallback_message entry. That array is simultaneously your billing receipt (each attempt that produced output bills at its own model's rates; a pre-output decline costs nothing and consumes no rate-limit capacity) and your serving-model log. Build quality dashboards, cost attribution, and eval segmentation from it, not from the request you sent. That array is the record on the API-managed paths; a manual retry is two separate responses, so on that path the attempt log is yours to assemble.
Sticky routing is where naive attribution breaks completely. Once a conversation has fallen back, later requests that include the fallbacks parameter are served directly by the model that rescued the earlier turn, for approximately one hour, scoped to your organization, without re-trying the requested model at all. The design is sensible: it avoids re-paying for an attempt that would predictably decline again on every turn. But a sticky-served turn carries no fallback content block, because nothing declined on that turn. The detection recipe is indirect: a fallback_message entry in usage.iterations, no message entry for the requested model, and the model field naming the understudy. Miss this and you get the ghost story version of a production incident: an hour of Opus 4.8 answers in a workload you believe is running Fable 5, with response-quality metrics drifting and nothing in your logs that says why. The routing decision is also explicitly best-effort, so the requested model can be re-tried at any time. Both directions of surprise are in play, which is exactly why the instrumentation has to read what served, never what was asked.
Which fallback path should you choose?
Detection is half the contract. The other half is what happens next, and here the answer depends on which surface you're standing on, because the retry mechanics differ by platform in ways the launch coverage mostly skipped.
| Your situation | The path | What it gives you |
|---|---|---|
| Claude API or Claude Platform on AWS | Server-side fallback (beta) | One request, one response; the API runs the retry and returns the model that answered |
| Any platform, with the TypeScript, Python, Go, Java, or C# SDK (the route for Bedrock, Vertex AI, and Microsoft Foundry) | SDK refusal-fallback middleware | Client-side retry configured once on the client; also a valid beta-free alternative on the Claude API |
| Ruby, PHP, raw HTTP, or custom routing logic | Manual detect-and-retry plus fallback credit | Full control; you own the retry, the repricing, and the conversation pinning |
| Claude apps (web, desktop, mobile, Claude Code) | Automatic switch with a visible notice | No code; the surface handles it and labels the serving model |
Server-side fallback is the lowest-wiring option: name up to three fallback models on the request, send the server-side-fallback-2026-06-01 beta header, and a declined request is retried inside the same API call. The response names the serving model and marks the handoff with a fallback content block. The constraints are worth knowing before you commit. The parameter is in beta on the Claude API and Claude Platform on AWS only; it is rejected on the Message Batches API and absent on Bedrock, Vertex AI, and Microsoft Foundry. Batches get a recipe instead of a parameter: collect the refused items (they come back as succeeded results carrying the refusal stop reason), strip Fable 5's thinking blocks from any multi-turn histories, and resubmit on a fallback model as a new batch or as direct requests. Entries must come from the requested model's permitted-target list, and at launch Fable 5's published list is Opus 4.8. That carries a quiet architectural consequence: adopt the vendor fallback path, and the model you're migrating away from stays in your dependency graph as the designated understudy. And only a safety-classifier decline triggers the chain. A rate limit, an overload, or a server error on the requested model is returned to you as-is, which means your existing availability-retry logic stays, and the refusal path sits beside it rather than replacing it. One more operational seam: each attempt counts against its own model's rate limits, and if the fallback model is rate limited the retry simply is not made; you get the refusal back with a stop_details.recommended_model hint. A fallback chain you never load-tested is a chain that quietly returns refusals at exactly the moment your traffic peaks. Coverage has its own gaps too, subagent calls among them; those are policy questions I return to below.
The SDK middleware moves the same logic client-side. It works on any platform, including the Claude API itself, which makes it both the path for the platforms the beta doesn't cover and the choice for a team that would rather not take a beta dependency. You configure the fallback list once on the client constructor, share a fallback-state object across a conversation's requests so follow-ups stay pinned to the model that accepted (the client-side cousin of the server-side sticky routing above), and the middleware handles the rest, including two chores you don't want to hand-write: it strips Fable 5's thinking blocks from cross-model retries and manages them in conversation history afterward, and it sends the fallback-credit beta header on every request it touches. Coverage is TypeScript, Python, Go, Java, and C#; Ruby and PHP teams write the detect-and-retry pattern directly for now. The docs are emphatic on one point: middleware or the server-side parameter, never both on the same request.
Manual retry is where the billing mechanics surface, and they reward attention because for long cached prompts and agent contexts the dominant cost in a retry is the prompt cache. A hand-rolled retry re-writes the fallback model's prompt cache from scratch at cache-write rates. Fallback credit exists to refund exactly that: when a blocked Fable 5 request carried a billable cached prefix, the refusal carries a credit token in its stop_details, and a retry that redeems it within the 5-minute window reprices the repeated prefix at the cache-read rate, 10 percent of the base input price, instead of the cache-write rate. The catch is strictness: redemption requires the prompt-shaping fields (system, messages, tools, tool choice, thinking, and cache settings) to be byte-identical to the blocked request, so the thinking-block stripping that is good hygiene on a plain retry is exactly the thing you skip on a credit-redeeming one. The server-side path and the middleware apply the credit for you; only the hand-rolled path has to think about it, which is itself an argument for not hand-rolling unless you have a routing, audit, or compliance requirement the packaged paths cannot express.
The apps surface is the contract you don't write but should still know, because your team lives on it. In the Claude apps, a flagged request switches to Opus 4.8 automatically with a visible notice and the response is labeled by the model that answered; the switch is on by default and can be turned off, in which case a blocked request pauses instead, and the rerun bills at the serving model's rates rather than free. Both of my launch-day classifier fires resolved exactly this way: a notice, a model switch, an uninterrupted session. For assistant-shaped products, that experience is the bar your API integration is implicitly measured against: a consumer surface that degrades gracefully and visibly sets the expectation, and an API integration that silently drops the same request sits below it. For workflow, eval, or regulated products, the right bar may instead be an explicit, auditable refusal, which is exactly where the next section goes.
What belongs in a refusal policy that no SDK default can decide?
Everything above is wiring. What remains is policy, and the policy layer is where I see teams stop short, because the SDK examples make fallback look like a solved default rather than a set of decisions. Four of those decisions belong to you, not to the middleware, and they are governance questions as much as engineering ones; I treat the general discipline in agentic AI governance in production.
First: is silent fallback even the right behavior for your workload? The packaged paths answer a refusal by re-running the request on Opus 4.8, and for low-risk assistive traffic, where continuity matters more than model identity, that is the right default. But a refusal is information, and some consumers of that information should see it. A security-research platform may want a cyber refusal surfaced to the user with an explanation rather than silently served by a different model with different capabilities. A regulated workflow may need the refusal logged as a compliance event before any retry. And any workload where the two models differ materially in quality needs to decide whether an unlabeled understudy answer is acceptable product behavior, because the API, unlike the apps, shows your user no notice. Per category, the honest options are retry, surface, or block, and choosing per category is the policy. There is also a class of work where the right answer is not to route to Fable 5 at all: a workload bound by zero-data-retention commitments cannot run on a Covered Model, Anthropic's designation for models with mandatory retention, in the first place, and a routing layer that respects the model-by-effort matrix should encode retention constraints alongside cost, as I argued when the matrix gained its Fable 5 row.
The second decision is where refusals come from in your own stack, because one category is partly self-inflicted. Anthropic's Fable 5 prompting guidance warns that prompts, skills, or harness instructions telling the model to echo or explain its internal reasoning in response text can trigger the reasoning_extraction category, raising fallback rates on requests that have nothing to do with security or biology. Those instructions were harmless habits on earlier models. The migration step is an audit of your system prompts and agent instructions for show-your-thinking phrasing, replacing it with reads of the structured thinking blocks adaptive thinking already returns. This is the cheapest refusal reduction available to a migrating team, and no amount of fallback wiring substitutes for it.
Third: does your fallback coverage reach the requests most likely to need it? The docs' own pitfall list is a map of the gaps I would audit first. The fallbacks parameter does not propagate into model calls made from inside tool execution, so an agent's subagent calls each need their own fallback configuration; an orchestrator with a protected top-level call and a dozen unprotected workers has mostly unprotected traffic. Size retry capacity per request rather than per turn or per session, because a single agentic turn can produce several refusals across its subagents; then pair the per-request cap with an aggregate per-turn spend ceiling so a fan-out turn cannot multiply retries across its workers. And fallback should be a property of the request, not of ambient state: error-recovery branches and background workers that rebuild requests are exactly where a global toggle drifts out of sync.
And fourth, the one I'd start with: what is your real refusal rate? Don't adopt anyone else's number, including Anthropic's. The launch announcement puts classifier fires at under 5 percent of sessions and concedes in the same breath that the classifiers launched "stricter than would be ideal," with false positives expected. Both claims are self-reported and launch-week fresh. Anthropic's own system card shows how workload-dependent the rate is: on the Terminal-Bench 2.1 agentic coding evaluation, 20.9 percent of Fable 5 trials hit a safety refusal and fell back to Opus 4.8 for the rest of the trajectory. A session average near zero and a benchmark rate near one in five are both true at once; the difference is the workload. Meanwhile the day-one practitioner roundups collected false positives on plainly benign biology and security questions. My own day zero matched the spread: one full working session with zero fires, and a security audit that tripped the classifier twice in a day. A launch-wide average is not your rate. Sample your own traffic, segment by category and by workload, and let the measured rate, not the headline one, size your fallback capacity and your alerting thresholds. That is the same instrumented-before-default discipline I applied to the Opus 4.8 upgrade, pointed at a new question.
Won't classifier tuning make this guide unnecessary?
The strongest objection to investing in any of this is that the sharpest pain is explicitly temporary. Anthropic has said it will keep reducing false-positive rates after launch, and history backs the trajectory: the false-positive wave that followed Opus 4.7's cyber safeguards in April, which I covered at the time, subsided as the classifiers were tuned. On that view, the 20.9 percent benchmark figure and the launch-week anecdotes will read as period pieces within a quarter, and a team that waited will have skipped a round of panic wiring. There is a version of this argument I agree with: if your plan is to hard-code routing rules around the exact day-one trigger behavior of specific classifiers, don't. That behavior is the most perishable layer of the stack, and this guide flags every number that will date.
But the objection confuses the rate with the contract. What Fable 5 shipped is not a temporarily twitchy filter; it is a permanent change to the API's failure semantics. stop_reason: "refusal", the stop_details taxonomy, the fallbacks parameter, the fallback content block, usage.iterations as the per-attempt record: that surface is versioned, documented, and now baked into the SDKs of five languages. It doesn't disappear when the false-positive rate halves. A lower rate changes how often the path executes, not whether the path exists, and a path that executes rarely is precisely the path that fails silently for months when it is not instrumented. The refusal-rate dashboard you build this month is also the instrument that tells you when the tuning has landed for your workload, which is information you cannot get any other way. The happy-path wiring is an afternoon with the packaged paths; a production rollout adds the telemetry, the tests, and the policy decisions, and those are not things the tuning will do for you. Classifier improvement changes the answer to "how often"; it never answers "what should happen when."
How this guide stays current
The sources under this guide move on two clocks, and I treat them differently. The API contract (response shapes, parameter and header names, platform availability, billing rules) lives on continuously maintained Anthropic doc roots, and I verify this page against those docs rather than against snapshots; if Anthropic graduates the fallback beta, expands the permitted-target list beyond Opus 4.8, or extends sticky routing to streaming, those are contract changes and this page updates to match. The refusal-rate evidence, the under-5-percent figure, the Terminal-Bench measurement, the day-one false-positive reports, is a launch-window snapshot by nature, dated June 9, 2026 in the sources, and I keep it explicitly framed as perishable: it documents what launch week looked like, not what your traffic will see. When the measured picture changes materially, the body text changes with it rather than leaving stale numbers standing.
The contract is the durable part
A refusal on Claude Fable 5 is a successful response that no model answered, and the entire discipline of handling it follows from taking that sentence seriously. Detection: branch on the stop reason, never on status codes, with the default branch alerting instead of dropping. Observability: refusals and fallback-served turns as first-class events, attribution from usage.iterations, sticky routing accounted for. Routing: the packaged path that fits your surface, sized for the rate you measured, reaching the subagent calls the defaults miss. Policy: per-category decisions about retry, surface, or block that someone on your team made on purpose. None of it is exotic, and the packaged paths make the wiring genuinely short. What separates teams is whether anyone treated the refusal path as a real path before the first silent week of drops made the argument for them.
If you are running this migration now, the launch-day contract decode is the five-step audit, this guide is the depth behind its refusal step, and the safety-tier breakdown maps the two other kinds of "safe" a refusal handler never catches: the silent-degradation tier that steers an answer with no stop reason at all, and the model's own diligence failures. And if you want a second set of eyes on your specific surface mix, the per-category policy, or the instrumentation, this is work I do with engineering teams: walking your highest-volume call site through this contract is a short session, and it usually surfaces the one unprotected request path that would have been the incident.