The promise that sells Claude Code subagents is speed: fan the work out, run it in parallel, finish faster. That's the headline. It's also the wrong reason to reach for them.
A multi-agent system burns roughly 15 times the tokens of a single chat. That figure is Anthropic's own, from the post where they describe building their internal research system, the same system that then beat a single agent by 90.2% on the eval it was built for. High cost, high upside, and a Gartner forecast that over 40% of agentic AI projects get canceled by 2027. That spread isn't all luck. A big part of it is whether the work was the right shape for orchestration before anyone split it.
I started running subagents because they were fast. I kept the pattern for a different reason. A Claude agent scoped to one job, with its own context, returns more trustworthy output and hallucinates less than one agent asked to research, code, review, and design in a single window. Scope is the decision. Speed is the side effect. What follows is the map I use to decide whether a task should be orchestrated at all, and which pattern it gets if it should.
Orchestration is a scope-of-work test, not a speed play
The default is one agent. Anthropic's guidance on building effective agents is blunt about it: add complexity only when it demonstrably improves outcomes. Orchestration is complexity. It costs latency, it costs tokens, and it adds a coordination surface that fails in its own ways.
So the real question is not "can this run in parallel?" It's "does this work split into distinct scopes that each get more trustworthy in a dedicated agent?" Research, coding, QA, and visual design are different jobs with different failure modes. An agent holding all of them at once spreads its attention thin and its context wide, and that's exactly where hallucinations creep in. Give each scope its own agent, its own context window, and its own tool permissions, and the verbose middle stays out of the orchestrator. Only the digested result comes back.
That's the property you're buying: isolation. The speed is real, but it is a consequence, not the point.
None of this means speed doesn't matter. A deadline can be the reason you reach for fan-out. It just can't be the test that tells you the split is safe. Scope decides what may run in parallel; the deadline decides how hard you push once it does.
The four gates, and the one that decides
Four questions settle it. The first does most of the work. The other three shape how you isolate once you have said yes.
- Scope of work. Does the task split into distinct kinds of work, each with its own inputs and its own failure modes? If there is only one scope, stay flat. One agent, one context.
- Firsthand-knowledge coupling. Does any piece need intimate, in-the-moment context it cannot reconstruct from another agent's summary? If so, keep that piece in one context. A summary is lossy, and the implicit decisions buried in someone else's work do not survive the handoff.
- Context blast radius. Would running everything in one window let an early mistake spread? An upstream hallucination gets treated as fact by every step after it. Isolation contains the blast radius, and a dedicated challenge agent early in the flow catches the bad fact before it travels.
- Write safety. Do several steps write to the same place? Keep writes single-threaded behind one owner with a reconciliation gate. Parallel writers make conflicting implicit choices that no summary reconciles cleanly.
Gate one is the one engineers skip. They feel a task is big, reach for subagents, and never ask whether the work was more than one scope.
The same gates pick the pattern
The gates do double duty. The answers that tell you whether to orchestrate also tell you which pattern fits. Past gate one, the shape of the work selects the pattern.
| Pattern | Reach for it when | Token cost | What isolation buys |
|---|---|---|---|
| Stay flat | One scope, bounded or tightly coupled | 1x | Nothing to isolate; coordination would only add risk |
| Parallel fan-out | Independent scopes, one deadline | ~N x | Each scope runs clean in its own context |
| Orchestrator-worker | Decompose, dispatch, then synthesize | several x | Verbose exploration stays out of the lead's window |
| Sequential review | Correctness-critical, upstream errors costly | 2-3x | A dedicated reviewer catches a bad fact before it spreads |
| Planner-executor | Deep, multi-stage work | highest | Layered scopes, each isolated from the others |
When the fan-out branch wins and the subtasks are mechanical, /batch is the bundled implementation to reach for. For the four patterns with full cost math and failure modes, the four-pattern catalog is the long-form companion to this map.
This is not theory I read about. On this site's own publishing pipeline, the orchestrator fans out in two waves of read-only specialist agents, five for research and five for validation, each in its own context, and it owns every write itself. A measured probe clocked three native subagents dispatched in one message at 14.87 seconds against 43.8 for the same work run serially. The speed was a bonus. The pipeline is built that way because a fact-checker that works from the claims alone, without the draft's persuasive framing, checks them more honestly than one steeped in the author's case. That's the discipline I think of as production agentic delivery: the structural guarantee does the work, not a reminder to be careful.
Both ways to get it wrong
There are two failure directions, and the test catches both.
Over-orchestrate, and you pay the coordination tax for nothing. The MAST study of 1,642 multi-agent execution traces found seven production systems each failing between 41% and 87% of the time on their own benchmarks, and noted that the gains over a single agent are often minimal. A sharper result followed: on multi-hop reasoning with an equal thinking-token budget, a single agent matched or beat the multi-agent setup, because every handoff between agents can lose information a single context would keep. That study deliberately excluded tool use, code execution, and vision, which is exactly the work where scope separation does pay. A pure reasoning chain that fits in one window is the "stay flat" leaf, not a counterexample.
Under-isolate, and one early hallucination poisons everything after it. This is the failure I've watched hurt the most. A wrong fact lands upstream, gets reinforced at every step, and is locked in as false consensus by the end. The fix is not a bigger context window. It's a dedicated agent whose only job is to challenge the upstream output before the rest of the pipeline trusts it.
Over-orchestration
- A single bounded scope split across agents
- Tightly coupled work that needed shared context
- Paying a multi-agent token premium for speed on a reasoning chain
- Coordination failures the one agent never had
Under-isolation
- One generalist agent doing research, code, and QA at once
- An upstream hallucination treated as fact downstream
- No dedicated reviewer until the very end
- False consensus locked in by the final step
The gates sit between these two ditches. Scope of work keeps you out of the left one. Blast radius and the early challenge agent keep you out of the right one.
Run the test before you split
The whole thing fits on one screen, which is the point. Before you split a task across agents, walk it through the four gates. Scope first: if the work is one job, keep it one agent. If it's many jobs, isolate each one, keep your writes single-threaded, and put a challenge agent early. The pattern falls out of the same answers that told you to orchestrate.
For the production failure-mode taxonomy and the sizing rule for when orchestration stops paying, the Subagent Orchestration in Production guide goes deeper than a one-screen map can. If you're wiring orchestration into a team's workflow and want a second set of eyes on where the gates fall for your actual work, that's the kind of thing I do in agentic workflow engagements. Book a 15-minute call and bring one task you are unsure whether to split.
FAQ
Do subagents make Claude Code faster?
Often, but speed is the wrong reason to reach for them. Orchestration multiplies token cost; Anthropic measured multi-agent systems at roughly 15 times a single chat. And on pure reasoning tasks with an equal token budget, a single agent can match a multi-agent setup. Reach for subagents when scope separation makes the work more trustworthy, not just faster.
When should I not orchestrate subagents?
When the task is a single scope, when it is small and bounded, or when a worker would need firsthand context it cannot reconstruct from another agent's summary. Tightly coupled work loses its meaning when you split it across agents.
How do I decide which orchestration pattern to use?
Run the same four gates. One scope means stay flat. Independent scopes mean parallel fan-out. Decompose-and-synthesize means orchestrator-worker. Correctness-critical upstream work means a sequential review with a dedicated challenge agent placed early.