What if the obstacle to scaling Claude Code in your engineering org isn't the model, the documentation, or the training plan, but the fact that your team thinks CLAUDE.md, skills, hooks, and plugins are roughly the same kind of thing?

Stack Overflow's 2025 Developer Survey put Claude Code at 40.8% adoption among developers using AI agents. Eighty-four percent of developers use or plan to use AI tools, up from 76% the prior year. The rollout is happening. The mental model isn't keeping up.

This week I had a 90-minute slot in a Code Quality training day at an enterprise SaaS org. Four hundred people on the call. Hundreds of engineers. Plus product managers, ops, and leadership rounding out the room. The chat told me everything I needed about how people were thinking. Six questions kept recycling, paraphrased differently each time. They came from engineers and non-engineers alike, which is what made the pattern interesting.

The answer to none of them was "go read the docs."

The flat mental model is the actual problem

Before the questions, the diagnosis. The room kept reaching for the same shape of answer: "Claude Code is a tool, the tool has a config, the config goes in CLAUDE.md, and when CLAUDE.md doesn't work I escalate." Four mechanisms and a marketplace, all collapsed into one mental model. The shape feels reasonable until you watch it produce a help-desk ticket that says "Claude Code is ignoring our rules" three weeks into a rollout.

The four mechanisms differ on three orthogonal axes the room had not separated. Whether the directive is always-on or on-demand. Whether it is advisory (the model decides how to interpret it) or deterministic (a script enforces it before the model gets a chance to dissent). Whether it is session-scoped or event-scoped. CLAUDE.md is always-on, advisory, session-scoped. A skill is on-demand, advisory, session-scoped. A hook is event-scoped, deterministic. A plugin is the packaging layer that distributes the other three. The marketplace sits one level higher again as the distribution channel for plugins, which is why it is not in the comparison table below. Once those three axes are visible, the routing falls out almost mechanically.

FOUR MECHANISMS, THREE AXESLIFECYCLEENFORCEMENTSCOPECLAUDE.mdAlways-onAdvisorySessionSkillsOn-demandAdvisorySessionHooksEvent-firedDeterministicEventPluginsPackagingInheritedCross-repo

Anthropic publishes a trigger table that does the routing for you. Claude gets a convention wrong twice, add it to CLAUDE.md. You keep typing the same prompt to start a task, save it as a user-invocable skill. You paste the same playbook three times, capture it as a skill. Something must happen every time without asking, write a hook. A second repository needs the same setup, package it as a plugin. The decision-tree post I wrote last week walks the routing logic in detail and is the longer companion to this piece. What follows are the six questions the room kept asking, in the order they arrived, with the answer that landed.

Q1: "What should I put in CLAUDE.md, and where else does it live?"

This was the first question, asked five different ways in the first ten minutes. The version that surprised me most came from a senior engineer who had clearly been productive with Claude Code for months: she just had not been told that CLAUDE.md exists in more than one location.

CLAUDE.md is loaded into every Claude Code session. The memory documentation describes four locations Claude reads from, in priority order:

LocationPathLoaded forUse it for
Managed policy/etc/claude-code/CLAUDE.md (org-deployed)Every developer in the orgNon-negotiable org standards your platform team owns
Project<repo-root>/CLAUDE.mdAnyone who opens this repoProject-specific facts and conventions, committed to git
User~/.claude/CLAUDE.mdJust you, across every projectPersonal preferences (your shell, your aliases, your style)
Local override<repo-root>/CLAUDE.local.mdJust you, just this projectPersonal notes for a project, not committed

Most of the room had only ever opened the project file at the repo root. The platform-managed one is the lever a tech lead has when "every Claude Code session in our org should know X" is the actual ask. It is not a magic file. It is just a file at a specific path, deployed by the same machinery your org already uses for any other endpoint configuration.

The other half of the question is what belongs in CLAUDE.md versus what should live somewhere else. The 200-line rule is the ground truth. Anthropic's memory documentation is direct: "Files over 200 lines consume more context and may reduce adherence." CLAUDE.md gets loaded in full regardless of length, but full load is not full adherence. Once the file gets long enough to need a table of contents, the most important rule in it is also the most likely to get ignored.

If a directive is reference material (an API spec, a style guide, a debugging playbook), route it to a skill. If it is scoped to one part of the codebase (a worker directory, a specific service), route it to a path-scoped rule or a subdirectory CLAUDE.md. If it is supposed to fire whether Claude feels like it or not, route it to a hook. CLAUDE.md is for the small set of things Claude should always know, in every session, with no extra friction.

Q2: "Why does Claude Code keep ignoring directives?"

This was the diagnostic question. It came up in chat twice in the first fifteen minutes and once more by voice. Whoever asked it was usually frustrated, sometimes angry, occasionally three days deep into a debugging cycle they had concluded was a model bug.

Three causes, in order of how often I see them.

Cause one: the file is too long. Two hundred lines is the Anthropic-published target. The AGENTIF benchmark paper (arXiv preprint, May 2025, on Claude 3.5 Sonnet) measured what happens past it: instruction success rates collapse to near zero when instruction length passes roughly 6,000 words. The benchmark is a year old and tests a now-superseded model; Anthropic's own Sonnet 4.6 announcement says the newer models are "meaningfully better at instruction following," but no public benchmark has put a current number on the improvement at long context. The structural fact remains: CLAUDE.md is loaded in full but adherence degrades with length, and Anthropic's own memory docs say there is "no guarantee of strict compliance." The right reaction to a CLAUDE.md that has grown past 200 lines is to move reference material into skills, not to add a louder header to the rule that is being ignored.

Cause two: the session is polluted. The best practices page names two patterns by their behavior. The "kitchen sink session," where one task bleeds into another, then circles back to the first, leaves the context full of irrelevant tokens before the real work begins. "Correcting over and over," where Claude does the wrong thing, you correct, it does the wrong thing again, you correct, leaves the context full of failed approaches that the model now treats as legitimate prior turns. The fix in both cases is /clear and a fresh start with a better initial prompt that captures what you learned. There is also a quieter failure mode: a nested CLAUDE.md in a subdirectory does not auto-survive /compact, so a rule that worked in turn three may have evaporated by turn forty. The project-root CLAUDE.md is re-injected after compaction; nested ones reload only when Claude next opens a file in that subdirectory.

Cause three: the directive is in the wrong layer. This is the hardest one to see, because the symptom looks identical to causes one and two. If a rule must hold every time, no exceptions, and you have written it in CLAUDE.md, you have written a request, not a guard. The features overview is direct about this: "An instruction like 'never edit .env' in CLAUDE.md or a skill is a request, not a guarantee. A PreToolUse hook that blocks the edit is enforcement." The rule was not the problem. The layer was.

Note

The AGENTIF data is from May 2025 on Claude 3.5 Sonnet. It is the freshest published instruction-following benchmark of its kind, but it predates Claude Sonnet 4.6 and Opus 4.7. Treat the "near zero past 6,000 words" finding as order-of-magnitude evidence that adherence has a probabilistic floor, not as a current measurement. The honest framing is that even with perfect mental model and clean configuration, some portion of "ignored directive" failures are model-side. Better triage shrinks the avoidable share to zero. The rest is what it is.

Q3: "What skills should we be using, and at what points in the dev cycle?"

The framing that landed hardest in the room was a one-liner: if you have a multi-step workflow that you run often, turn it into a skill. The chat lit up on that one. Half a dozen "ohhh" reactions. It is the framing I default to whenever I teach this material, and the room response tracks closely each time. It is also Anthropic's own canonical phrasing, nearly verbatim: "You paste the same playbook or multi-step procedure into chat for the third time → capture it as a skill."

The room-friendly version covers the most common case but undersells the actual scope. The skills documentation and the features-overview together describe three distinct creation triggers, and a tech lead running a rollout should be able to spot each one as it shows up.

Trigger one: repeated multi-step playbooks. The classic case. You wrote a sequence of steps for "set up a new feature branch our way" or "run the pre-deploy checks" and you have pasted it three times. That sequence belongs in a skill. The skill body loads when invoked, runs the steps, and disappears from context when the conversation moves on.

There is a second trigger Anthropic calls out separately: "You keep typing the same prompt to start a task." Same content every Friday morning, you typing "let's run an incident review, here's the format..." for the third week running. That is not a playbook trigger. It is a user-invocable skill (think /incident-review), where the trigger is the invocation itself, not the steps inside it.

And then the third trigger, which is the one the rooms I train miss most. Open your CLAUDE.md and look for a section that has slowly grown into a procedure rather than a fact. The skills docs phrase it cleanly: "a section of CLAUDE.md has grown into a procedure rather than a fact." If you find yourself adding multi-step instructions there, that material has outgrown CLAUDE.md. A skill body loads only when invoked, so the same material costs almost nothing in context until you need it. The 200-line problem from Q2 often resolves itself when you do this triage one section at a time.

When in the dev cycle each kind fires depends on the trigger. Always-available reference skills (/api-reference, /postgres-cheatsheet) fire whenever Claude judges them relevant. User-invocable skills (/commit, /deploy, /run-incident-review) fire only when you type the command, which is the right shape for high-stakes operations you do not want Claude to "decide" to invoke. Background-knowledge skills with user-invocable: false are loaded by Claude in response to relevance signals but never appear in your slash menu, which is right for things like "how our legacy auth system works" that the model needs to know but you would never type as a command.

If you have not built a skill yet, the actual mechanics are short. I wrote a hands-on guide for first-timers that walks the three creation paths from "ask vanilla Claude to build it" to "use /skill-creator with an eval harness." The intimidation gap dissolves quickly. SKILL.md is a markdown file with frontmatter. That is it.

Q4: "How do hooks work?"

If Q1 was the most-asked, Q4 was the surprise. The energy in chat shifted from familiar-but-confused to "I have never heard of this before." Hooks are the layer the room had not been formally introduced to, and the gap showed up in the questions immediately.

The canonical line first. Hooks are deterministic. Anthropic's hooks guide opens with it: "They provide deterministic control over Claude Code's behavior, ensuring certain actions always happen rather than relying on the LLM to choose to run them." That is the single sentence to memorize. CLAUDE.md asks. A skill teaches. A hook enforces.

The events you most often want are PreToolUse, PostToolUse, and SessionStart. PreToolUse fires before any tool call (Bash, Edit, Write, Read, etc.) and can block the call before it executes. PostToolUse fires after, which means it can react and log but cannot undo. SessionStart fires once per session and is where you inject anything Claude needs at the top of every conversation, including a freshly computed timestamp or a status snapshot of the repo. Anthropic's hooks reference groups events into three cadences (session, turn, and tool-call), but in production those three events alone cover the majority of cases.

The piece that catches more people than any other is the exit code.

Caution

Only exit code 2 blocks the action. Anthropic's hooks reference is explicit: "Claude Code treats exit code 1 as a non-blocking error and proceeds with the action, even though 1 is the conventional Unix failure code. If your hook is meant to enforce a policy, use exit 2." Your hook also needs to write its blocking message to stderr, not stdout, because Claude Code ignores stdout JSON when the exit code is 2 and feeds stderr text back to the model as the rejection reason. If you wrote a hook in 2024 habits and used exit 1 for the "block this" path, your enforcement layer is silently doing nothing.

The other thing the room wanted to know was where hooks live. They sit in settings.json files in three locations: .claude/settings.json at the project root (committed to git, the right home for org-mandated guardrails), ~/.claude/settings.json (personal, your machine only), and a managed-policy path that platform teams own (the same enforcement lever as managed CLAUDE.md). For repos with strict rules that need to survive after /compact, a SessionStart hook with the compact matcher re-injects whatever context you specify whenever Claude auto-compacts the conversation. That single pattern fixes more "Claude forgot the rule halfway through" tickets than anything else I have configured at scale. If your hooks worked on Opus 4.6 but mis-fire on 4.7's stricter literal reading, the opus-compatibility-scanner walks the seventy patterns that catch the regressions.

Q5: "What are plugins and skills, where do we get them, and how do we use them?"

The vocabulary question, asked by an engineer in chat with the kind of slight defensiveness that comes from having read three blog posts already and emerged with less clarity than before. Skills, plugins, the marketplace, slash commands, subagents: same words, slightly different meanings depending on which post you read.

Here is the layered version. Skills are markdown files (SKILL.md with a frontmatter spec, optional reference files, optional executable scripts). They live in a .claude/skills/ directory at the individual or project level. Plugins are the packaging layer that bundles skills, hooks, subagents, and MCP servers into one installable unit. Anthropic's plugins docs put it cleanly: "Plugins bundle skills, hooks, subagents, and MCP servers into a single installable unit from the community and Anthropic." A plugin is not a different kind of skill. A plugin is containers of skills (and hooks, and the rest), with a plugin.json manifest, a version, and a marketplace listing. The marketplace is a catalog of plugins, not a catalog of individual skills. Anthropic ships an official marketplace that auto-attaches at startup, and there are community marketplaces (Trail of Bits' bundle, Jesse Vincent's Superpowers, others) you opt into per project.

Standalone or plugin? The features-overview row that decides this is "a second repository needs the same setup → package it as a plugin." If the skill is private to one repo and you do not need to share it, ship it standalone in .claude/skills/. If you need it in three repos and want to version it independently, package it as a plugin. The same skill code can live in either shape; the plugin manifest is the only difference.

The numbers in the official marketplace as of today, taken live from claude.com/plugins:

0 K Frontend Design installs (Anthropic)
0 K Superpowers installs (Jesse Vincent)
0 K GitHub plugin installs (Anthropic)
0 K Playwright plugin installs (Anthropic)

The room had a live disagreement on plugin adoption that surfaced in chat. One camp wanted vanilla Claude Code with team-authored skills only, on the theory that every dependency is a future migration. The other camp wanted Superpowers and the Trail of Bits suite by default, on the theory that a 476,000-install plugin has done more battle-testing than an internal skill written on a Friday. Both positions are legitimate. The point I made in the room: whichever you pick, pick it deliberately, and tell your team which. The failure mode is not "vanilla vs. plugins" but "no one knows which we use, so half the team installs Superpowers and half does not, and the rules drift." That drift is the configuration debt that compounds across a 100-engineer org.

Q6: "Can skills be used in Claude AI and Claude Desktop too?"

This question came from a product manager, on voice. It changed the shape of the answer I had been giving. The engineering room had been asking about Claude Code skills. The PM was asking about Claude.ai skills, which are related but not the same.

Anthropic's Agent Skills standard was published October 16, 2025. The same SKILL.md format works across Claude Code, Claude.ai (the web product), Claude Desktop, and the API. The portability is real. So is the gotcha: each surface has its own skill registry, and they do not share state. Installing a skill in Claude Code does not make it available in Claude.ai. Uploading a skill in Claude.ai does not make it available in Claude Code. The format is portable; the deployment is per surface.

For Claude.ai specifically, the support article walks the path: Customize > Skills > Upload ZIP. The skill is private to your account unless your org provisions it through Team or Enterprise plans, and it requires code execution to be enabled. That last point sometimes gets missed: a skill that calls a script to do anything useful needs the underlying execution sandbox turned on. Claude Desktop currently supports skills via the plugin system; local-filesystem skills (the convenience pattern Claude Code users are used to) are an open feature request as of early 2026, not yet shipped.

For technical leaders running a rollout that spans engineering and product, this matters. If your PMs are using Claude.ai with custom skills, that is a different deployment path with different governance than the Claude Code skills your engineers ship in .claude/skills/. The same SKILL.md file may live in three places, in three states of currency, with no central registry. If you have not set up a single source of truth for skill versions across surfaces, that is the drift problem in waiting. The Claude vs Claude Code post covers the foundational surface distinction in more depth and is the right pre-read for anyone scoping a multi-surface deployment.

Five triage questions that replace the docs hunt

Synthesis. The room could not have read its way out of the questions it was asking. The taxonomy of the questions itself was the diagnostic. So the rollout playbook I keep coming back to is not a longer guide. It is a smaller set of triage questions a tech lead should be able to answer about any directive before deciding which mechanism to use.

First: must this happen every time, no exceptions? If yes, the answer is a hook. CLAUDE.md is a request; a hook is a guarantee. If the answer is "mostly but not always," CLAUDE.md or a skill is the right home and you are having a different conversation.

Second: does Claude need to know this in every session? Every session is CLAUDE.md (under 200 lines). Only sometimes is a skill, since the body loads on demand and costs nothing in context until invoked.

Third question, deceptively simple. Fact or procedure? Facts and conventions go in CLAUDE.md. Multi-step procedures go in skills. The operational test is Anthropic's "third time pasting it" rule from Q3 above.

Fourth: does this need to work across multiple repositories? Plugin if yes, standalone .claude/ if no. Plugins are the distribution and versioning layer, not a different kind of skill.

The fifth question is the hardest because the symptom looks identical to all of the above. If we are troubleshooting "ignored directives," check in this order: file length, session age, layer routing. The third cause is the most-misdiagnosed in my experience. It shows up as confidently as a model bug, and gets called one most often.

Across the rollouts I have supported, the "Claude Code is ignoring our rules" tickets a tech lead inherits resolve into one of three rollout failure modes most of the time. The CLAUDE.md has grown past 200 lines and lost the rule in the noise. Hooks were never written for the rules that needed to fire deterministically. Sessions are running long without /clear, and context pollution is doing the work. The fix in every case is a triage decision, not a longer doc.

Better triage replaces the 30-page guide that gets read once and never referenced. It is the asset a tech lead can hand a new hire on day one and have them route correctly by week two.

Six questions, one missing axis

Walk back through the six. What goes in CLAUDE.md and where else does it live? Routing question. Why does Claude Code keep ignoring directives? Hygiene plus layer question. What skills should we use, and when in the dev cycle? Trigger question. How do hooks work? Mechanism question. What are plugins, skills, and the marketplace? Vocabulary question. Can skills be used across Claude surfaces? Portability question.

None of the six is exotic. None requires a research paper to answer. What links them is the assumption underneath: that the four mechanisms plus the marketplace all operate at the same level of abstraction. They do not. The four mechanisms (CLAUDE.md, skills, hooks, plugins) differ on lifecycle, enforcement, and scope; the marketplace is the distribution channel for plugins, one level removed. Five different jobs, not five flavors of the same thing. Once those layers are visible the routing is mostly mechanical. The thing the room needed was not a longer document. It was the triage layer that lets each engineer, PM, and ops lead route their own question without escalating it.

If you are running a rollout, the practical move is to teach the axes first and the mechanisms second. I work through this directly inside teams' codebases as part of Claude Code Infrastructure Setup engagements: the CLAUDE.md(s), the skills, the hooks, the path-scoped rules, the plugin posture (vanilla or Superpowers or somewhere between), all configured around how the team already ships. If you would rather walk it together first, book 15 minutes and we will look at your team's workflow before deciding what changes.

The answer to none of the six questions was "go read the docs." The docs were fine. The room needed the layer above them.