The busiest production pipeline I run is staffed by twelve specialist subagents. A fact-checker, a skeptic-reader, an SEO scanner, nine more. The same repo carries a shelf of skills, and the two file types sit a few directories apart, which is about how close the confusion lives. Across the Claude Code rollouts I've trained and audited, the failure report is nearly always the same sentence: "Claude Code is ignoring our rules." The rules weren't ignored. They were routed into the wrong mechanism.
The stakes aren't niche anymore. The 2025 DORA report puts 90% of surveyed technology professionals on AI at work, with 30% still reporting little or no trust in AI-generated code. Configuration is where that trust gets built or burned, and Claude Code gives you two configuration surfaces that look interchangeable when they are anything but.
So here is the claim I'll defend: Claude Code subagents vs skills is a context-architecture choice, not a capability choice. A skill loads procedural knowledge into the session that invoked it. A subagent staffs the work out to a separate context window with its own tool boundaries. Pick by where you want the reasoning to live, not by which file format you learned first.
What's the difference between a subagent and a skill in Claude Code?
A skill teaches the session. A subagent staffs it.
Mechanically: a skill is a folder holding a SKILL.md file. Its frontmatter description, a sentence or three of routing text, sits in your session's context from startup so Claude knows the skill exists. When the task matches, the full body loads into the current conversation and Claude follows it right there, with your conversation history, your earlier decisions, and your project state all still in view. The reasoning never leaves the room. That's the property that makes Claude Code skills the right home for conventions, checklists, and procedures, and it's why I called skills the advisory layer in the rule-routing decision tree.
A subagent is a different animal. It's a Markdown file in .claude/agents/ whose body becomes the system prompt of a fresh, separate context window. Your conversation history does not come along. Your invoked skills don't either. Going in, the subagent gets the prompt the parent writes when it dispatches, plus your standing project context: CLAUDE.md and git status load into every custom subagent. Coming back, the parent's context gets the final message and nothing else. The intermediate file reads, tool calls, and dead ends stay out of the parent's window; the subagent's own transcript persists and can be resumed, and any writes you allowed it persist too. So this is context isolation, scoped by the allowlist, not a sandbox. That context isolation is the entire point, and it's the foundation subagent orchestration builds on.
| Dimension | Skill | Subagent |
|---|---|---|
| Where the reasoning lives | Inside the invoking session | A separate context window |
| What sits in context at startup | Description only | Nothing until dispatched |
| What crosses in | The whole conversation so far | Dispatch prompt + CLAUDE.md + git status |
| What crosses back | Everything stays in context | Final message only |
| Tool surface | Session's tools (gateable while active) | Its own allowlist |
One detail in that table deserves a second look. The "what crosses in" row cuts both ways. A subagent can't be confused by your session's clutter, and it also can't benefit from three hours of accumulated decisions unless you restate them. Forget that, and you'll wonder why your specialist returned generic work. It wasn't being lazy. It never saw the context you assumed it had.
How to create a custom subagent: the .claude/agents file format
The file format is genuinely the easy part. Here's a trimmed version of the fact-checker agent from that production pipeline:
---name: fact-checkerdescription: Verifies every statistic, quote, and capability claim in a draft against its cited source. Use after a draft is complete, before publish. Returns a findings table; does not edit files.tools: Read, Grep, WebSearch, WebFetchmodel: sonnet---
You are a fact-checking specialistfor technical blog posts.
For every hard claim in the draft(number, date, quote, capability):
1. Locate the claim's cited source in the outline's Sources table.2. Fetch the source and read it neutrally; ask what the page says, never assert the claim.3. Classify it: VERIFIED, CONTRADICTED, or UNVERIFIABLE.4. Return one findings table, sorted by severity. Do not rewrite the post.Four decisions are hiding in those few lines, and each one is a context-architecture decision wearing a config field's clothes.
Create the file
Project-scoped agents live in .claude/agents/, user-scoped ones in ~/.claude/agents/. On a name collision the precedence order is managed, CLI flag, project, user, then plugin.
Write the description as a routing rule
This field is how Claude decides whether to delegate. Describe the exact situations that should trigger dispatch, not the agent's job title.
Scope the tools
The tools field is an allowlist. Omit it and the subagent inherits broad access; set it and the boundary is enforced at dispatch, not by politeness.
Pick the model
Aliases like haiku, sonnet, and opus work, as does inherit. Cheap models for mechanical lanes, your main model for judgment calls.
Restart, then test dispatch
File-based agents load at session start, so a mid-session file is invisible until you restart. Test by naming the agent explicitly before trusting auto-delegation.
The field that fails quietly is the description. Anthropic's docs tell you to write it like a routing rule, and the same page carries a troubleshooting section titled "Claude not delegating to subagents" with three documented causes: the Agent tool isn't allowed, the description is too vague to match, or the prompt never names the agent. Read that section as a design constraint rather than an FAQ. Auto-delegation is probabilistic description-matching, so a description like "helps with code quality" competes badly against every other vague description in the catalog. "Use after a draft is complete, before publish" gives the matcher an actual condition.
The tools line is the second quiet decision, and it's the one I lean on hardest in production. The twelve subagents in my pipeline are read-only on disk by design: researchers and reviewers carry Read, Grep, WebSearch, WebFetch and nothing else, and the orchestrating session owns every write. That rule isn't a convention the agents are asked to follow. It's an allowlist the harness enforces. A skill can gate tools too (its frontmatter accepts allowed-tools and disallowed-tools while the skill is active), but it gates them inside the shared session; only a subagent pairs the tool boundary with a separate context.
Two version notes worth pinning, since this surface moves. As of Claude Code v2.1.172, shipped June 10, 2026, file-based subagents can spawn their own subagents up to five levels deep, which used to be a hard no. I pushed on that depth limit in a deep-dive on nested subagents, where a probe ran nine levels deep without hitting the cap. The change is changelog-fresh: the SDK's programmatic subagents still can't nest, and the main docs page hadn't caught up to the changelog as of this writing. And name collisions resolve by a fixed precedence order, so a project-level agent silently beats your personal one with the same name.
When a subagent is the right call
If the mechanism is "a separate context with its own tool surface," the routing question writes itself: does this work belong in a different room? Four signals say yes.
First, noisy intermediate output. Codebase exploration, log spelunking, dependency audits: work that generates thousands of tokens the parent will never need again. Anthropic's own guidance on when to delegate leads with exactly this case, and it's the reason my research agents return a findings table instead of their forty intermediate page-fetches.
Second, parallel lanes. On April 17 I dispatched three research subagents in a single message and they finished in 14.87 seconds; the same three dispatched sequentially took 43.8. Independent tasks sent together run concurrently, and the wall clock shows it.
Third, a hard tool boundary. A reviewer that physically cannot edit files. A researcher with web access but no Bash. If the constraint matters, it belongs in an allowlist, not in a paragraph of instructions.
Fourth, fresh eyes. Ever asked the author of a bug to review their own fix? A verification pass loses value when the verifier shares the author's context and its accumulated biases. A subagent starts cold by construction.
Now the bill, because a yes on a signal means "consider it," not "dispatch reflexively." Every one of those signals buys isolation with tokens. Anthropic's cost docs put agent teams at approximately 7x the tokens of a standard session when teammates run in plan mode. That figure describes a separate multi-session feature, not plain subagents, but the direction transfers: separate parallel contexts cost more than one session. One practitioner's measurement put three subagents at about four times a single-thread session's spend; treat that as one developer's benchmark rather than a vendor figure, but it matches my own bills. The community has also documented the squeeze from the other side: a long-running GitHub issue describes parallel subagents sharing the parent session's token budget while each return is capped around 8,192 output tokens, which is how a five-agent codebase analysis ends in truncation. The working pattern, there and in my pipeline, is files-as-interface: subagents write their full findings to disk and return a short summary.
There's an over-application failure mode, too. A January 2026 evaluation of multi-agent systems cites empirical evidence that a single agent's performance often degrades once its toolkit grows past 8 to 12 tools. Decomposition earns its keep when it relieves that kind of overload, not when it decorates a task one focused agent handles fine; that second half is my conclusion from running this pipeline, not the paper's. Six subagents reviewing a function is not rigor. It's six invoices.
If the signals are split, the four-gate decision map walks the orchestration call in more depth, and the orchestration patterns post covers what to do after you commit. However the call goes, all four signals reduce to the thesis: route by where the reasoning should live, not by which mechanism feels more powerful.
When a skill is the right call
Flip the question. Some knowledge is only useful inside the session's own reasoning trace. Your team's PR conventions. The deploy checklist. The exact shape a migration plan should take, applied to the migration the session has spent an hour discussing. Send that to a subagent and you've mailed the style guide to a contractor who has never seen the codebase; load it as a skill and the session applies it to everything it already knows.
The first skill I ever shipped is exactly this shape: a code-archeology skill that teaches the session how to work an unfamiliar codebase. Which files to read first, which naming conventions carry signal, where the bodies tend to be buried. As a subagent it would be the wrong default. The investigation it guides happens in the session where I'm asking the questions, and the heuristics have to be present where that reasoning is happening.
In dollars, Anthropic's cost documentation makes a version of the same argument: a skill that carries domain knowledge saves the tokens Claude would otherwise spend rediscovering your architecture file by file. I'll flag the tension honestly, because it complicates my framing: the official docs sell skills cost-first, and a fair reader could conclude the skill-vs-subagent call is a budgeting decision. I think the cost lens and the architecture lens mostly agree, since tokens are what you pay whenever knowledge sits in the wrong place. But the docs' framing is a reminder that "cheaper" and "architecturally correct" usually point at the same file for different reasons.
Skills come with their own scaling trap, and it lives in the same field that routes subagents: the description. Research published this spring measured what happens to skill routing as catalogs grow. The SkillRouter work found that hiding skill bodies from the routing input costs 31 to 44 percentage points of selection accuracy across retrieval architectures, and a companion study of scaling laws for skill libraries fit a clean log-decay: accuracy falls roughly three percentage points every time the catalog doubles. Treat the exact figures as study-specific, but the principle held across model families, and it matches what I see in my own repo: description quality changes the slope of the decay, not the existence of it. For calibration, a four-skill catalog will not feel any of this. Mine passed eighteen skills this spring, and at that size every description already has to read as a firing condition rather than a job title, or the wrong skill wakes up. A skill catalog is a thing you weed.
Which is also why neither mechanism is an enforcement layer for when it runs. Both route probabilistically off descriptions; both can simply not fire. (After dispatch, a subagent's allowlist is real enforcement, but something has to dispatch it first.) Anything that must happen every time belongs in a hook, per the decision tree. And if you're ready to write the skill itself, the skill-creation guide covers that end to end. Same architecture question as the last section, opposite answer: this knowledge has to live inside the session's reasoning, so a skill is its home.
The boundary is blurrier than the comparison tables admit
Here's the strongest case against everything above, and it comes straight from the vendor. Anthropic's feature-comparison docs state it plainly: the mechanisms combine. A subagent can preload skills through its skills: frontmatter field. A skill can request isolated execution with context: fork. And since Dynamic Workflows shipped as a research preview in late May 2026 (Claude Code v2.1.154), there's a third layer that scripts orchestration above both. If the two mechanisms interconvert, calling the choice "architecture" might be dressing up a token-efficiency preference. Mischoose and you pay a few cents, not a failed workflow.
It's a fair objection, and the composition features are real. I still think it fails on two facts.
The first is that the blur has had sharp edges in practice. For months, skills loaded from plugins ignored their context: fork and agent: fields and ran in the current conversation anyway; Anthropic fixed it in v2.1.101, and a follow-up request was closed against the same fix. Look at what the failure was while it lived: no error, no warning, just the skill quietly running in your session when you had asked for isolation. The wrong choice here changes behavior, not just the bill, which is exactly the premise the objection denies. If a hybrid path matters to your workflow, test which context it actually lands in; the default is what you get the moment a regression ships.
The second fact runs the other way: subagent isolation is thinner than the marketing diagram. Per the official docs, only the built-in Explore and Plan agents skip your CLAUDE.md and git status; every custom subagent loads both. What a custom subagent sheds is your conversation history and your in-flight reasoning, not your project's standing context. Isolation is a dial, and the file format sets its default position.
So the composition case doesn't dissolve the distinction. It proves it. When I preload a skill into a subagent, I'm answering two different questions with two different files: the skill says what the work should know, the agent file says where the work should happen and what it may touch. A recent academic teardown of Claude Code found that only about 1.6% of the product is model-facing decision logic, with the rest being operational infrastructure around it. The extension surfaces follow the same philosophy. They're small, composable, and unopinionated about each other, which means the architecture opinions have to be yours.
FAQ
What is the difference between a subagent and a skill in Claude Code?
A skill loads procedural knowledge into the session that invoked it. The description sits in context from session start, the full body loads on trigger, and the reasoning continues in your conversation. A subagent runs in a separate context window with its own system prompt and its own tool allowlist; only the dispatch prompt crosses in, and only the final message comes back.
Can a subagent use a skill?
Yes, and it's the composition pattern worth learning: list skills in the subagent's skills: frontmatter field and they preload at startup, so the skill carries the knowledge while the agent file carries the boundary. The reverse path, a skill requesting isolation via context: fork, exists as well; it shipped with a gap (plugin-loaded skills ignored the fork until v2.1.101), so verify which context a hybrid skill lands in before relying on it.
Why isn't Claude delegating to my subagent automatically?
Three documented causes: the Agent tool isn't in the allowed tools, the description is too vague to match against the task, or the prompt never names the agent. Auto-delegation is description matching. Naming the subagent explicitly is the dependable path, and a description written as a routing rule ("use after a draft is complete, before publish") matches far more reliably than a job title.
Do subagents cost more than skills?
Usually. Each subagent pays its own startup and exploration costs, parallel agents multiply spend roughly linearly, and Anthropic's cost docs put agent teams near 7x a standard session's tokens. The same docs recommend skills as a cost reducer, since preloaded knowledge replaces token-expensive exploration. The spend is justified exactly when one of the four routing signals holds.
Pick the room before you write the file
Strip away the YAML and one question remains: where should this reasoning live? In your session, with everything it has learned this hour, or in a separate room with a narrow door? Skills answer the first. Subagents answer the second. Teams that route by that question stop writing agent files that should have been three paragraphs of SKILL.md, and stop pasting the same prompt weekly because nobody promoted it to a file at all.
Twelve subagents and a shelf of skills run my heaviest production pipeline every week, and the boundary between them is the most load-bearing decision in the system. The agent files own isolation and tool boundaries. The skills own procedure. Neither owns enforcement; that's what hooks are for.
If you're starting tonight, start small: one skill for the procedure your team re-explains most, one read-only subagent for the exploration work that keeps flooding your context. The skill-creation guide is the next read for the first half. And if you'd rather have a second set of eyes on your whole .claude directory, my Custom Skill Development work covers exactly this split; a fifteen-minute call is enough to find out whether your rules are routed into the right files.