Anthropic instrumented its own product and published the number: Claude Code users approve 93 percent of permission prompts. Read that as a security control and it's a strange one. A gate that opens 93 times out of 100 is not a gate. It is a toll booth, and the toll is paid in the scarcest resource an agentic workflow has: operator attention. Anthropic's engineering team has a name for what the toll does to people. They call it approval fatigue, the state where users might not pay close attention to what they're approving, "in turn making development less safe."

That phrase comes from the vendor that builds the prompts, not from a critic. And it points at a model I want to defend in full: approval prompts are a budget, not a safety layer you can stack for free. Every prompt spends a unit of human attention. Spend it on prompts nobody can genuinely evaluate, and the system trains its operators to approve the next one without looking. Past that point, adding a gate does not add safety. It subtracts from the gates you already had.

What approval fatigue is, and what it costs

Approval fatigue is the erosion of human oversight that sets in when permission prompts arrive faster than an operator can genuinely evaluate them. Each prompt that gets approved without scrutiny does two things: it authorizes the action in front of it, and it lowers the bar for the prompt behind it. The safeguard does not fail loudly. It hollows out while continuing to look like oversight.

The numbers around this are not subtle.

0 % of Claude Code permission prompts get approved (Anthropic telemetry)
0 % fewer prompts under OS sandboxing (Anthropic internal measurement)

The 93 percent figure is Anthropic's own telemetry from the auto mode engineering post. The 84 percent prompt reduction under sandboxing comes from the same team's internal usage; they published no sample size or methodology, so treat it as a vendor's directional claim rather than an audited benchmark. The closest mature analogue we have is the security operations center, the team that triages a company's security alerts, where a 2025 survey of human-AI collaboration research aggregates industry reports of roughly 4,484 alerts per SOC per day, 67 percent of them ignored. Different domain, same species of failure: a signal channel sized past what humans can process stops being a signal channel.

On a working day I run between four and eight concurrent Claude Code sessions, so this stopped being abstract for me a while ago. Security operators got to the failure mode first; agentic development is catching up fast. At that concurrency, prompt volume is not an edge case. It's the workload. The question stops being "should a human be in the loop?" and becomes "which loop can the human afford to be in?"

A fair pushback before going further: a high approval rate is not, by itself, proof of rubber-stamping. If an agent's work is predictable, approving most of it is the correct call. But that concession is the argument. A decision predictable enough to approve every single time is a decision a rule should be making, so a sustained 93 percent approval rate is strong evidence that the prompt stream is carrying work that belongs to configuration, with the genuinely ambiguous seven percent buried in it. The signal sharpens when the approvals cluster in repeated prompt classes you could name without looking.

That is a budgeting question. Which is the thesis: the prompt is not the safety mechanism. The attention behind it is. And attention is finite while prompts are not.

Why more review makes it worse

Here is the belief I am writing against, in the form I keep hearing it: rubber-stamping is a discipline problem. If operators paid attention, or if we added one more checkpoint for the risky stuff, human-in-the-loop would keep us safe.

The human-factors literature has been refuting that belief for decades, and the refutation got specific about AI agents this year. A Microsoft Research study of developers using software agents, published this month, watched 17 developers supervise agentic coding tools and found that real-time monitoring was "rarely performed." Developers reviewed by proxy instead. The plan stood in for what the agent did; passing tests stood in for correctness. The authors frame these as bounded-rationality shortcuts, rational responses to systems too fast and too verbose to audit exhaustively. Seventeen developers is a small qualitative sample, but the shortcut pattern it documents is the one the older oversight literature predicts. Not laziness. Arithmetic.

The older oversight literature explains why training doesn't rescue the situation. An interdisciplinary review of human oversight effectiveness identifies four conditions that all have to hold for oversight to be more than theater: the overseer needs causal power over the system, epistemic access to what it is doing, self-control in the moment, and intentions that fit the role. In plain terms: the power to stop the thing, enough visibility to know what it is doing, enough focus left to use both, and goals that match the job. Volume attacks the second and third directly. The review names the resulting states: learned carelessness, exhaustion, motivation loss. Worse, automation bias bites hardest when the system is usually right, because redirecting attention away from a reliable system is the locally rational move. Every reliable agent run you approve is evidence, to your own brain, that the next one doesn't need reading.

I have watched the same automation-bias mechanics play out in AI code review, where the complacency literature maps almost one to one. But the incident that made the budget model concrete for me was quieter, and it happened in my own repository. I run a fleet of guard hooks in this codebase, deterministic scripts that veto specific commands before they execute. One of them gated pull-request merges behind a cleanup wrapper. It worked in every test. Then it was bypassed in four consecutive working sessions, and not by the model getting clever. By me. At some point I had added a broad allow rule for gh commands, one more convenience approval in a stream of them, and in the Claude Code build I was running at the time, an allow-rule match skipped the guard hook entirely. A rule I added with thirty seconds of attention silently disarmed a guardrail I had built with hours of it. Four sessions ran before I noticed. The current docs state that a blocking hook takes precedence over allow rules, so verify the interaction on your own version; the lesson about casual allow rules outranking deliberate controls does not depend on which build you are on.

Nobody in that story lacked discipline. The structure routed around the human, because one approval, made casually, outranked the deliberate control. That's what a spent budget looks like from the inside: you don't feel fatigued, you feel efficient.

So the misconception fails on both ends. Operators can't will themselves into attention that volume has already spent, and the extra checkpoint you add becomes one more prompt in the stream that spent it. None of this indicts review as a category: batched review, sampled audits, and policy review upstream of the pipeline all move judgment out of the hot path. The failure mode is specifically the low-context prompt that interrupts at run time. If the budget is the constraint, the fix has to change what draws on the budget. That is structural work, not motivational work.

Where a prompt dies before it reaches you

The structural fix is unglamorous: arrange the layers so that almost every tool call gets settled by a deterministic rule, and the only calls that reach a human are the ones a human can meaningfully judge. Claude Code's permission system is a decent concrete anatomy for this, and as of June 2026 it gives you a precedence chain with three rule types and a set of permission modes that decide what happens when no rule matches.

Where a tool call gets settledA vertical precedence chain for a single tool call in Claude Code. Deny rules are checked first; a match blocks the call with no human attention spent. Ask rules are checked next; a match routes to an operator prompt. Allow rules are checked third; a match runs the call with no attention spent. When nothing matches, the permission mode default decides: a sandboxed or dontAsk run is settled without a prompt, while the default ask path reaches the operator. Only the paths that reach the operator spend the approval budget.Tool calldeny rulesask rulesallow rulesmode defaultsettled by ruleoperator prompt checked firstmatch: blockmissmatchmissmatch: runmisssandbox / dontAskask
Every layer above the prompt retires a decision before it costs attention.

Read that diagram as a budget instrument, not as a flowchart. Deny rules encode "never, regardless of context" decisions: made once, enforced forever, costing zero attention per call. Allow rules encode "always fine" decisions on the same terms. Hooks sit beside them as programmable gates; a PreToolUse hook can veto a call with a script, and per the docs a hook cannot override a deny rule, though, as my four-session incident taught me, a casually added allow rule kept my guard hook from ever firing on the build I was running. Below all of that sits the OS sandbox, which changes the default for contained commands from "ask" to "run," with OS-level enforcement instead of model-level promises. That is the layer behind Anthropic's 84 percent prompt-reduction figure.

LayerDecision encodedLives inPer-call attention cost
Deny ruleNever, regardless of contextpermissions.denyZero
Allow rulePre-approved, alwayspermissions.allowZero
PreToolUse hookScripted condition, then verdict.claude/hooks + settingsZero
OS sandboxContained, so let it runSandbox configZero
Ask promptA human must judge thisAsk rules / mode defaultOne unit per call

A few honesty notes belong next to that table, because the layers are not equally trustworthy and none of them is free. Anthropic's own docs describe argument-constraining bash patterns as fragile. The sandbox does not cover the built-in file tools, and by default it can still read credential files like ~/.ssh and inherits your environment variables, so lock both down explicitly before trusting it with secrets-adjacent work. One wrinkle from the docs: a sandboxed command can skip a bare whole-tool Bash ask rule, though content-scoped ask rules and deny rules still hold. And "zero" is per-call, not free: rules cost design, testing, and a periodic audit up front. What they do not cost is attention every time the agent acts.

For non-interactive contexts, the same logic produces a specific recommendation I have written about in CI pipelines: prefer dontAsk mode, which auto-denies anything that would have prompted, over bypassPermissions, which auto-approves it. Both produce a prompt-free run. One of them fails closed. And because dontAsk cannot prompt at all, the judgment-class actions described below have to become deny rules in non-interactive runs, with the genuinely ambiguous cases routed to a human outside the pipeline.

Does the structural approach pay off in practice? My own numbers say yes, with the honest caveat that one repo and one failure class is an existence proof, not a benchmark. After the four-session bypass incident, I spent roughly an hour on structural hardening: a write-path guard hook, a dispatch-preamble hook, deny rules backing the merge gate. Since then I have run those four to eight concurrent sessions per working day with zero recurrence of that failure class. One hour of rule-writing bought back the attention that dozens of daily prompts had been spending for me. The engineering managers I write for usually have this exact inversion available: the rules their teams keep re-approving manually are the ones most eligible to become structure. If a rule has to hold every time, give it teeth, not a paragraph. The same routing logic that applies to CLAUDE.md directives, which are advisory by design, applies to your approval surface.

The steel man: structure fails too

The strongest objection to everything above is not "humans should review more." It is that the deterministic layers I just praised have their own bypass catalog, and it is not short.

Start with denylists. Backslash Security demonstrated four distinct bypass techniques against Cursor's command denylist in July 2025: base64 obfuscation, subshell execution, script embedding, quote escaping. Cursor has reworked that auto-run surface since, but the underlying argument is mathematical and version-proof. Any command has unbounded syntactic variants, and a string-matching denylist enumerates finitely many of them. Sandboxes fare better but not cleanly. A researcher at Ona watched Claude Code slip a deny pattern with a path-resolution trick, and when the bubblewrap sandbox caught that, the agent did something more instructive than escaping: it asked for approval to run unsandboxed, and approval was granted. The boundary held. The prompt did not. (The same researcher's separate exec-gate prototype fell to an ELF dynamic-linker trick, so no vendor's enforcement layer gets to feel smug.) Cymulate found a conditional-protection flaw that let an agent inject hooks into a settings file the sandbox only protected if it existed at startup; that one was a tracked CVE, patched in Claude Code v2.1.2. And in May, Microsoft's security team published CVE-tracked remote-code-execution paths in AI agent frameworks, including a prompt-injection chain that used two exposed tools to walk through container isolation with no approval prompt anywhere in the path. Their conclusion: "your LLM is not a security boundary." The tools you expose define your attacker's affected scope. Across the whole catalog the pattern is consistent: string filters miss variants, sandboxes have edges, and the tools you expose set the blast radius.

There is a regulatory layer to the objection too. The EU AI Act's Article 14 makes human oversight capability a legal requirement for high-risk systems, and the legal scholarship around it argues the Act targets awareness of automation bias without regulating the conditions that cause it. If you operate in or sell into regulated environments, a team that deletes its approval prompts and points at its deny rules may be operationally safer and still have a compliance problem.

So does the steel man win? Notice what it argues and what it does not. Every bypass in that catalog defeated a deterministic layer, and in the Ona case the approval prompt was the layer that failed outright: the researcher's point was that in a real workflow the request to run unsandboxed is "one more 'yes' in a stream of 'yes'," because approval fatigue turns a security boundary into a rubber stamp. The honest reading is not "structure replaces prompts" or "prompts backstop structure." It is that no single layer survives contact with a capable agent, which is precisely the case for layered guardrails, and that Microsoft's own remediation pattern, removing the exposed tool entirely, points at the cheapest control of all: a smaller tool surface has fewer things to gate.

There is also a directional reason to keep spending the budget structurally even with both layers imperfect. A rule is a decision made at rest: written once, reviewed deliberately, and patched for everyone the day a bypass is published, which is exactly what happened to the Cymulate flaw. A prompt is a decision made under load, remade dozens of times a day, and it degrades with every repetition. Both can be beaten. Only one of them gets weaker the more you use it.

The budget model absorbs this refinement without strain. Structure does not exist to replace human judgment. It exists to spend the routine decisions cheaply so the prompts that remain arrive rarely enough to be read. A prompt that fires twice a day on genuinely ambiguous actions is a functioning control. The same prompt buried in a stream of ninety is noise that occasionally contains a catastrophe.

Which prompts deserve to exist

If you accept the budget model, the practical question inverts. Instead of "which actions are risky enough to prompt on?" ask "which decisions is a human the only valid evaluator of?" Stable, repeated decision classes should be settled by a rule in one direction or the other; classes you don't understand yet can keep their prompts while you watch and learn their shape.

Three properties earn a prompt. Irreversibility: actions with no undo, like force-pushes, production deploys, data deletion. Outward reach: anything that leaves the machine, publishes, sends, or spends. Scope change: an agent asking to widen its own permissions or touch credentials. What the three share is that the cost of a wrong yes is open-ended and context-dependent; a rule written in advance cannot price that blast radius the way the person watching the specific change can. Treat them as a floor, not a ceiling: privileged reads of customer data or anything secrets-adjacent, and pulling new dependencies in from the network, can earn a prompt without fitting any of the three neatly. And frequency stresses the test honestly: a pipeline that deploys irreversibly twenty times a day can't prompt twenty times a day. That's not an exception to the model. It is the model telling you to move the human decision upstream, into the review that approves the pipeline's rules, instead of leaving it downstream where run-time volume will rubber-stamp it. AWS's agentic security guidance lands in the same place from the liability side, warning that under prompt volume "approval becomes reflexive rather than deliberate, shifting liability to humans who have been placed in a position to fail", and reserving human gates for high-consequence actions. Meta's Agents Rule of Two is candid that even well-designed gates fail against "a user blindly confirming a warning interstitial." The vendors converging on this are not arguing for zero prompts. They are arguing for few enough that confirmation stays a decision.

A starting split for a working team, tuned to Claude Code but portable in shape:

  1. Deny-list the catastrophic. Force-push, recursive delete outside the workspace, anything touching secrets. These decisions need a human exactly once, at rule-writing time.
  2. Auto-allow the provably routine. Reads, builds, tests, linters. Put bash work in the sandbox so containment, not enumeration, does the allowing. But configure the sandbox before trusting it: it ships disabled, it warns and runs commands unsandboxed when it cannot start unless you set it to fail closed, it will retry failed commands outside the sandbox unless you disable that escape hatch, and its default read policy still reaches credential files. The sandboxing docs cover each switch; flip them closed first.
  3. Hook the conditional. Rules with logic in them ("merges go through the cleanup script," "no writes outside the active worktree") become PreToolUse scripts, not standing instructions the model can deprioritize.
  4. Prompt on the three properties above. Then watch your own approval rate. Treat a rate that sits above 90 percent as a trigger to segment prompts by action class and ask which repeated classes a rule should retire; the telemetry says that is the default state.

One deployment-reality caveat, because the research demands it. The MIT-hosted AI Agent Index, compiled in early 2026, surveyed deployed agent systems and found only nine of thirty document any sandboxing and most disclose almost nothing about safety controls. If your stack has no structural layer to route decisions into, the budget model still applies; you just start smaller. A deny list and one hook is an afternoon of work, not a platform rebuild. My hour of hardening came after the incident. I would price the same hour differently now.

This is also where I will say the quiet part about my own work: building this layer for teams, the deny rules, hooks, sandbox profiles, and permission modes matched to how a specific organization works, is what my Claude Code Infrastructure Setup service does. The pattern transfers; the specifics never do.

The implication of the budget model reaches past tooling, though. If attention is the binding constraint, then every workflow decision that multiplies prompt volume, more agents, more sessions, more gates added "to be safe," is a security decision, whether or not anyone framed it that way. The teams that scale agentic work safely will be the ones that treat the verification layer as something you deliberately build, and treat human attention the way they already treat compute: as a resource with a price, allocated where it buys the most. When I train rollout groups on Claude Code, the advisory-versus-deterministic distinction is the one that reorganizes how they think about every other control.

FAQ

Four questions that come up every time I make the budget argument, answered in its terms.

Is approval fatigue the same as alert fatigue?

Siblings, not twins. Alert fatigue degrades monitoring: operators stop reading signals about things that already happened. Approval fatigue degrades consent: operators stop evaluating actions before they happen, so each reflexive click authorizes something. The mechanics are shared, which is why the SOC numbers, roughly 4,484 alerts a day with 67 percent ignored, are a fair preview of where unmanaged prompt volume leads.

Should I just run bypassPermissions?

It does not spend the budget better; it deletes the budget. bypassPermissions has a legitimate home in throwaway containers holding no credentials and nothing you would miss. Everywhere else, dontAsk plus explicit allow rules plus the sandbox gets you the same non-interactive flow while keeping a deny layer between the agent and everything you never pre-approved. Fail closed, not open.

How many approval prompts per session is too many?

There is no magic number, and counting prompts measures the wrong thing. Measure your approval rate. Anthropic's telemetry puts the default at 93 percent approved; if your rate looks like that, the prompts are not functioning as decisions. Retire prompts into rules until the ones that remain are rare enough that you read them.

Do allowlists make agents less safe?

An allow rule is a decision made once with full attention instead of hundreds of times with none, and that trade is usually favorable. But allowlists rot. Entries added for convenience accumulate, and a broad allow rule can quietly disarm other guardrails; mine bypassed a merge-gate hook for four sessions before I caught it. Keep entries narrow, audit them on a schedule, and pair the allowlist with deny rules for anything that must never run unattended.

Spend it where judgment is irreplaceable

The 93 percent number is not an indictment of the people clicking approve. It is a readout of a design that asked them to make more decisions than attention allows, then counted each click as oversight. The budget model just makes the accounting honest: deny rules, allow rules, hooks, and sandboxes are how you buy decisions wholesale, and the approval prompt is how you pay retail. Retail is the right price for a small number of judgment calls. It is a terrible price for ninety a day.

If you want a second set of eyes on your own approval surface, where your prompts fire, which ones a rule should retire, and what your approval rate is telling you, that is a conversation I have with engineering leaders regularly. Fifteen minutes on a call is enough to map your permission layers against the budget and leave you with the two or three rule changes that buy back the most attention.