I get why you haven't built one yet.
The first time someone tells you about Claude Skills, the word "skill" sounds friendly. Then you click into the docs, see a YAML frontmatter spec, a references/ directory convention, an allowed-tools field, and a sentence about progressive disclosure across three context-window tiers, and the friendly word starts to feel like a mini-program in a new tech stack you didn't sign up for. So you close the tab and go back to pasting that long prompt into a fresh chat, the same way you have for the last six months.
I know, because the people I work with describe it the same way every time: "It feels like I'd be writing a program." That's the intimidation gap. And it's a shame, because skills are easier than that, more useful than most beginners realize, and more portable than most experts realize.
Some scale to set the stakes. As of February 2026 there were 40,285 publicly listed Claude Skills in the marketplace, and the Superpowers plugin alone (a single bundle of skills, not a single skill) shows 476,245 installs on its official Anthropic page. When Anthropic announced Skills last October, Rakuten was the launch quote: tasks that "once took a day" now took "an hour." Tessl reported on three production skills that were measured using Anthropic's own eval methodology: Cisco's software-security skill at 1.78x improvement over a no-skill baseline, ElevenLabs at 1.32x, Hugging Face's tool-builder at 1.63x. This isn't a small feature.
Here's the secret I want you to walk away with: you can ask vanilla Claude to build you a skill and it will do a respectable job in five minutes of conversation (figure another twenty or so to drop the file in the right place and confirm it fires). There are also skills built specifically for the purpose of building other skills. Anthropic ships /skill-creator in their public repo. Jesse Vincent's Superpowers plugin ships /writing-skills inside the official Claude Code plugin marketplace. Trail of Bits publishes a skill-improver, a workflow-skill-design plugin, and an ask-questions-if-underspecified clarifier. None of these were obvious to me when I started, and discovering them changed what I could ship.
This post is the orientation I wish I had when I built my first skill. What a skill is. Why it's worth your time. Three concrete paths to creating one, ranked from easiest to most rigorous. The frontmatter you need if you'd rather hand-write it. Where the resulting file runs (more places than you think). And honest notes on when you should reach for something else instead.
What a Claude Skill Is, and Why It's Less Scary Than It Looks
A skill is a folder with one required file in it: SKILL.md. The file has a tiny YAML frontmatter at the top with two required fields, name and description, and then a body of Markdown instructions that Claude reads when the skill triggers. That's the whole shape. Optional subdirectories like scripts/, references/, and assets/ exist for skills that need executable code, deeper documentation, or bundled resources, but a perfectly good first skill is a single Markdown file in a single folder.
The clever bit is how Claude loads it. Anthropic calls this progressive disclosure, and the cost structure is what makes installing many skills cheap.
Translation: Claude knows the skill exists from the moment your session starts (one sentence of metadata, ~100 tokens), but it doesn't pay the cost of reading the full SKILL.md until the description on Level 1 matches what you're trying to do. Scripts and reference files inside the skill folder cost nothing until Claude reaches for them. You can install a hundred skills and your context window won't notice.
The frontmatter that makes Level 1 work has constraints worth knowing up front:
---name: code-archeologydescription: Use when investigating why a piece of code exists. Walks git history, PR threads, JIRA links, and Confluence references to produce a clean changelog with deep links to original decisions. Triggers on questions like "where did this come from" or "why was this added".---
# Code Archeology Skill
When invoked, follow these steps:1. Identify the file path and line range the user is asking about....name is at most 64 characters, lowercase, hyphens and numbers only, and on Claude surfaces specifically cannot include the reserved words "anthropic" or "claude" (the open standard at agentskills.io is more permissive, but if your skill ships to Claude.ai or the Anthropic API, the reserved-word rule applies). description is at most 1024 characters and is the most important field in the entire file. It is what Claude reads to decide whether to trigger the skill, which means a vague description means the skill never fires. The Anthropic docs are explicit that Claude has a tendency to "undertrigger" skills, so they recommend descriptions that are "a little bit pushy." Superpowers takes the opposite tack and recommends "Description = When to Use, NOT What the Skill Does." Both are right; I'll get to that tension shortly.
The body below the frontmatter is Markdown. Just Markdown. There is no DSL to learn. If you can write a runbook for a junior engineer, you can write the body of a skill. That's the part teams miss.
A previous post covers the structural foundation in more depth if you want a slower walk through the artifact. The rest of this post is about creating one.
Why Build a Skill at All? The Value Versus Pasted Prompts
If you have a long prompt you keep pasting into chat, that prompt is a candidate for a skill. The Claude Code docs put it bluntly: "Create a skill when you keep pasting the same playbook, checklist, or multi-step procedure into chat, or when a section of CLAUDE.md has grown into a procedure rather than a fact."
The shift from pasted prompt to skill changes three things at once.
You stop forgetting parts of it. A pasted prompt erodes. You drop a paragraph one week to fit your context window, drop a different paragraph the next, and three months later you have a thinner version of the original that nobody can audit. A SKILL.md sits in version control. It cannot quietly lose half its instructions between conversations.
You stop pasting it. Once installed, the skill auto-triggers on a description match. You ask Claude to "investigate why this commit was added" and the relevant skill loads automatically, without you reaching for a paste buffer or remembering what the prompt was called this week.
You install many of them cheaply. The Level 1 metadata cost (~100 tokens per skill) is so low that "install fifty skills" is a perfectly reasonable answer to "I have fifty workflows." Compare that to fifty long prompts, fifty paste buffers, and fifty chances to paste the wrong one.
The measurable side has been measured. Tessl reported eval results on three production skills using Anthropic's own benchmark methodology: Cisco's software-security skill at 1.78x improvement (84% overall score), ElevenLabs' text-to-speech skill at 1.32x, Hugging Face's tool-builder at 1.63x. These compare skill-guided agent behavior against a no-skill baseline, not against a hand-tuned prompt, so don't read them as "1.78x better than your existing workflow." Read them as the best published evidence that skills meaningfully change what an agent will do.
For me the value showed up in a more specific shape. My first skill is a code-archeology workflow for a large engineering organization with hundreds of engineers spread across teams, products, and dozens of codebases (some inherited from acquisitions). When you land in unfamiliar code, figuring out the why of a function or a line of code or a particular change can derail your morning. I used to grab my Sherlock hat and pipe and walk the trail by hand: git log, git blame, the linked PR, the review thread, the linked JIRA, the Confluence doc the JIRA references, repeat. With a skill, I ask Claude one question and get a clean changelog with every link in it. The first time it worked end-to-end I felt slightly silly for not having built it sooner.
The pattern transfers cleanly outside engineering. Building a queryable UX research corpus leans on the same shape: the research findings live as tagged markdown files, and a research-conventions skill interprets queries against them (how to weight a 2024 mobile finding against a 2026 desktop question, what confidence levels mean, what to do when two findings conflict). Different domain, identical artifact.
That's the case for skills, in one sentence: they turn the same recurring 30-minute task into a 30-second one, every time.
Three Ways to Create Your First Skill
There are three good paths from blank page to working skill. Pick the one that matches your tolerance for ceremony, not the one that sounds most rigorous. Path A gets you a working SKILL.md draft in about five minutes of conversation. Path B and Path C are how you make that skill genuinely good.
Path A: Just ask Claude
If you have never created a skill, this is where you start. Open any Claude surface (Claude.ai, the desktop app, the iOS app, Claude Code in your terminal) and tell Claude what you want.
"I'd like to create a Claude skill that does X. Walk me through it and produce the SKILL.md when we're done."
That's it. Claude already knows the skill format and the frontmatter constraints. It will ask clarifying questions about scope, suggest a name, draft a description, write the body, and hand you a file you can drop into .claude/skills/your-skill-name/SKILL.md (project-scoped) or ~/.claude/skills/your-skill-name/SKILL.md (personal, available across all your projects). My first shipped skill was created exactly this way, and the v1 worked. It was narrow, it did what I asked, and only what I asked, but it worked.
This is the path that demolishes the intimidation barrier. You don't need to learn the YAML schema, you don't need to memorize the directory conventions, you don't need to read the docs. You describe the workflow you want and Claude produces the artifact. If something's off, you tell Claude what's off and it adjusts. The whole cycle takes minutes.
The catch is that vanilla Claude does a respectable job of building exactly what you ask for, and only that. Edge cases you didn't mention won't get covered. A vague description won't get pushed back on. The fallback you forgot stays missing. The first time I rebuilt my code-archeology skill through the next two paths, I felt the difference immediately: they filled gaps I hadn't realized existed. That's where the next two paths come in.
Path B: Anthropic's /skill-creator
Anthropic ships a skill specifically for creating other skills. You'll find it in the public anthropics/skills repo alongside their other published skills. Install it the same way as any other skill (drop the folder under .claude/skills/skill-creator/, or install it via the plugin marketplace), and then trigger it by saying you want to create a new skill.
What you get is an interview. The skill-creator asks you questions in a specific order: what should the new skill enable, when should it trigger, what does the expected output look like, what edge cases need handling, what dependencies does it have. It then drafts the SKILL.md, and (this is the part that surprised me) it scaffolds an evaluation harness alongside the skill. The eval harness runs paired comparisons (Claude with the skill versus Claude without) on test cases you provide, aggregates results into benchmark.json, and surfaces the pass-rate and token-cost delta. If you've ever wondered whether your skill is making things better or just adding noise, this is how you find out.
The other thing it does is description optimization. The skill ships a run_loop.py script that takes a 20-query trigger eval set, splits it 60/40 train/test, runs five iterations of description tuning, and selects the winner by test-set score (not training, to prevent overfitting). The first time I ran this on my code-archeology skill, the eval revealed that around half my test queries triggered correctly on the original description; nearly all of them triggered after the optimizer's tuned version. That delta is invisible to you when you're hand-writing.
Path C: Superpowers' /writing-skills
Jesse Vincent's Superpowers plugin lives in the official Claude Code plugin marketplace and ships a /writing-skills skill with a different philosophy. Where Anthropic's skill-creator runs an interview-then-eval loop, Superpowers applies test-driven development to skill authorship. The Iron Law in the SKILL.md is literal: "NO SKILL WITHOUT A FAILING TEST FIRST."
The flow is RED-GREEN-REFACTOR ported from TDD to documentation. You write a test scenario that should pass once the skill exists, run it on a subagent without the skill (RED, the agent fails or invents the wrong workflow), draft the skill (GREEN, the test passes), then refactor the skill against the rationalizations you observed (the agent's verbatim excuses for ignoring instructions become the next round of edge cases to plug). It also enforces a strict token budget: getting-started skills under 150 words, frequently-loaded skills under 200, anything else under 500.
The two meta-skills disagree in one specific place. Anthropic says descriptions should be "a little bit pushy" because Claude undertriggers. Superpowers says descriptions should describe when to use, never what the skill does, because Claude will follow the description summary instead of reading the skill body. Both observations are true. My read: write the description as Superpowers recommends (when to use, not what), but tune it pushier than your instinct says, then validate with Anthropic's trigger eval to land in the right spot.
Interview, eval, optimize
- Adaptive interview to capture intent + edge cases
- Scaffolds an eval harness alongside the SKILL.md
- Paired with-skill / without-skill benchmark runs
- 5-iteration description optimizer (60/40 train/test)
- Packages the skill into a distributable zip archive
TDD for skill authoring
- Iron Law: no skill without a failing test first
- RED-GREEN-REFACTOR cycle ported to documentation
- Strict word-budget per skill (150 / 200 / 500)
- Rationalization tables: capture and plug agent excuses
- Description = when to use, NOT what the skill does
And the new entrant: Trail of Bits
Trail of Bits' skills repo deserves a mention too. They ship a skill-improver plugin (an iterative refinement loop using automated fix-review cycles), a workflow-skill-design plugin (design patterns for workflow-based skills with a built-in review agent), and an ask-questions-if-underspecified clarifier. I haven't had a chance to test these yet, but I'm excited to and I'll update this post once I do. If you ship a security-adjacent skill, their security-focused skills (the rest of the repo) are worth a look as well.
The Combined Loop, and When to Use It: RED, DRAFT, GREEN, REFACTOR, OPTIMIZE, SHIP
You don't need this for your first skill, or your second. Path A is fine for both. Reach for this loop when you've shipped three or four, one of them is quietly underperforming, and you want to know why instead of guessing. Anthropic's /skill-creator gives you the proof; Superpowers' /writing-skills gives you the discipline. The loop below blends them: the discipline to write the skill, and the evidence that it helps.
(This is my own framing, not a published Anthropic or Superpowers methodology.)
RED
Pick three or four scenarios where you expect the skill to help. `/writing-skills` runs each one on a subagent that doesn't have the skill installed and surfaces the verbatim rationalizations the agent invents to do the wrong thing. Those rationalizations are your edge cases. Capture them for DRAFT.
DRAFT
Write the SKILL.md using the Superpowers structure. Description in the 'use when...' form (not 'this skill does X'), token budget under 500 words, and a rationalization table that names each excuse you saw in RED and the rule that defuses it.
GREEN
Run paired evals with Anthropic's harness on the same scenarios, with the skill versus without, and read the deltas: pass rate, time, tokens. The first time the numbers say your skill makes a specific class of input slightly worse, you know exactly which rule to revise.
REFACTOR
Read the failed transcripts from GREEN (the harness prints the output path when the run finishes). Find the new rationalizations the agent invented to dodge your rules, add them to the rationalization table, and re-test. Stop when the next GREEN run shows the pass rate has plateaued and the new failures surface no rationalization pattern you haven't already named.
OPTIMIZE
Tune the description so Claude triggers the skill reliably. Write ~20 trigger queries first: phrasings you'd expect to fire the skill, plus a few you'd expect not to. Anthropic's `/skill-creator` runs an optimizer that tests variants of your description against those queries and surfaces the winner. Apply it, then sanity-check it still says when to use the skill rather than what it does.
SHIP
Install the skill where you'll use it. Install paths for each Claude surface live in the 'Where Your Skill Will Run' section further down. If you're sharing it with teammates or uploading to claude.ai, package it as a zip archive first (the format the claude.ai Settings upload expects). Then put a quarterly reminder on your calendar to re-run OPTIMIZE as your usage patterns evolve. Descriptions that triggered reliably six months ago start missing the way you phrase things today.
Two prerequisites before you start: install both /writing-skills (from the Superpowers plugin) and /skill-creator (from the anthropics/skills repo), and plan to run the whole loop in one continuous Claude Code session so each step can reference the prior step's outputs.
Which skill to invoke at each step, with a starter prompt for each:
| Step | Skill to invoke | Starter prompt |
|---|---|---|
| RED | Superpowers /writing-skills | "Walk me through RED for a skill that [does X]." |
| DRAFT | Superpowers /writing-skills | "Draft the SKILL.md from the rationalizations we captured." |
| GREEN | Anthropic /skill-creator | "Run paired evals on this skill against my scenarios." |
| REFACTOR | Superpowers /writing-skills | "Extend the rationalization table from these failed transcripts." |
| OPTIMIZE | Anthropic /skill-creator | "Run the description optimizer on this SKILL.md." |
| SHIP | Anthropic /skill-creator | "Where should this SKILL.md live, and do I need to package it?" |
What a rationalization-table entry looks like (the artifact you build during DRAFT and grow during REFACTOR):
| Rationalization the agent invents | Rule the SKILL.md adds to defuse it |
|---|---|
| "This change is small enough that I can skip the design phase." | "All new behavior goes through design first. Size is not an exemption." |
| "The user told me what they want, so I'll just build it." | "Run the design questions before coding, even when the user seems decisive." |
Your rationalizations will be specific to your skill's domain. The shape is what matters: every excuse the agent invents in RED or GREEN gets a verbatim row plus a one-sentence rule that names the failure mode.
The rule of thumb I use:
- Superpowers gives you the WHAT and WHY. Discipline. Structure. The TDD frame that catches the failure modes you'd never have written a check for.
- Anthropic gives you the PROOF. Evals. Metrics. The optimizer that tunes your description from "triggers half the time" to "triggers nearly always" without you guessing.
A specific example: when I ran my code-archeology skill back through this loop, the OPTIMIZE step caught that my original description triggered correctly on common phrasings ("why does this exist") but missed less common ones ("who introduced this," "what change added this line"). The optimizer's winning description added two short trigger phrases I would not have thought to add, and trigger accuracy moved from roughly half the test set to nearly all of it. That's the difference between a skill people use and a skill that sits unused in ~/.claude/skills/ because Claude doesn't think to fire it.
The opus-compatibility-scanner is another example. I built that skill through every step of this loop before shipping; it audits a Claude Code project against 70 patterns (37 prose, 33 config) where Opus 4.6 idioms misfire on Opus 4.7. The build window was about 43 hours across 46 commits, and the eval-loop iterations are what kept the description tight enough to fire reliably without firing on unrelated questions.
If You'd Rather Hand-Write It: SKILL.md Frontmatter Fields That Matter
You don't have to use any of the meta-tools above. Plenty of people write SKILL.md by hand, and it works. If that's where you want to start, here's the field-by-field guide.
The two required fields in every SKILL.md, regardless of surface:
| Field | Constraint | What it does |
|---|---|---|
name | ≤64 chars, lowercase + hyphens + numbers, no anthropic or claude reserved words, must match the directory name (open standard) | Identifier Claude uses to refer to the skill internally |
description | ≤1024 chars | The Level 1 metadata Claude reads to decide whether to trigger the skill. Single most important field in the file. |
That's the complete list of required fields. Everything else is optional. In Claude Code specifically, several optional fields earn their place:
| Optional field | When to use it |
|---|---|
disable-model-invocation: true | The skill should only run when the user explicitly invokes it. Removes it from Claude's auto-trigger consideration entirely. |
allowed-tools | Restrict which tools Claude can use while the skill is active (e.g., Read, Glob, Grep for a read-only investigation skill). |
model | Override the default model for this skill (e.g., a heavy synthesis skill might pin opus). |
effort | Override effort level (low, medium, high, xhigh, max) for skills that need more or less reasoning budget. |
paths | Glob patterns limiting auto-activation to matching files. Useful for codebase-specific skills. |
argument-hint | Autocomplete hint shown when invoking the skill via slash command, e.g., [file-path]. |
user-invocable: false | Hide from the slash command menu; only Claude can trigger. |
context: fork | Run the skill in a forked subagent context to keep the parent context clean. |
A note: the allowed-tools field is honored by Claude Code CLI but not by the Agent SDK. If you're using the SDK, control tool access through the SDK's allowedTools option in your client configuration instead.
The body of the SKILL.md, below the frontmatter, is pure Markdown. The Anthropic recommendation is to keep it under 500 lines for performance; Superpowers says under 500 words. Either ceiling forces a useful discipline: if your skill needs more than 500 lines/words of instruction, you probably want to break out the deeper material into references/ files inside the skill folder. Claude reads references/*.md only when SKILL.md tells it to, which keeps Level 2 cheap.
Anthropic recommends writing the body in third person ("Processes Excel files and generates reports"), not first ("I help you process Excel files"), because the description is injected into Claude's system prompt and inconsistent point-of-view causes discovery problems. They also recommend explaining the why behind every instruction, not just the what: "If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag. If possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important."
That's the manual path, end to end. Two required frontmatter fields, a handful of optional ones, a Markdown body that explains the why and not just the what, and an optional references/ directory for anything that doesn't need to live in Level 2. If that sounds simpler than the schema page made it look the first time you read it, that's because it is.
Where Your Skill Will Run: Five Claude Surfaces, and Several That Aren't Claude
The same SKILL.md file runs in five distinct Claude surfaces, each with its own install path and quirks. It also runs in several non-Claude tools that adopted the format. This is the part nobody told me when I built my first skill, and it's why I now think of skills as portable agent capability rather than a Claude Code config file.
The five Claude surfaces:
| Surface | Where the skill lives | Plan / setup notes |
|---|---|---|
| Claude.ai (web) | Customize → Skills → upload | Custom skills require Pro, Max, Team, or Enterprise. Pre-built skills (PowerPoint, Excel, Word, PDF) are available to all users. |
| Claude Desktop (Mac / Windows) | Same as web; no separate UI | No platform-specific differences from claude.ai documented. |
| Claude Code (CLI) | .claude/skills/<name>/SKILL.md (project) or ~/.claude/skills/<name>/SKILL.md (personal) | Live change detection within a session. Enterprise > personal > project precedence. Plugin-installed skills get a plugin-name:skill-name namespace. |
| Claude API | container.skills parameter, three beta headers, REST /v1/skills for create/list/version/delete | Up to 8 skills per request. Custom skills are organization-shared (different from claude.ai). |
| Agent SDK | setting_sources=["user", "project"] required for discovery | SKILL.md allowed-tools field is ignored; control tools via SDK allowedTools config. |
The honest, slightly annoying note: skills do not sync across surfaces. Anthropic states this directly: "Custom Skills do not sync across surfaces. Skills uploaded to one surface are not automatically available on others." A skill uploaded to claude.ai is invisible to the API. A skill installed in Claude Code's filesystem is invisible to claude.ai. If you want the same skill in three places, you currently install it three times. This is fragmentation, and Anthropic acknowledges it.
I've personally used skills across Claude Code (CLI and the web app), Claude.ai web, the Claude iOS app (both the Claude.ai chat surface and Claude Code accessed through it), and Claude Desktop. The good news: the file is the same in every one. The friction is purely in the install step.
And then there are the non-Claude tools
This is the surprise. The Agent Skills format was published as an open standard in December 2025, and a meaningful number of non-Claude tools speak it natively now.
OpenAI Codex supports SKILL.md as a first-class concept, with the same progressive disclosure model and frontmatter conventions. Skills live in .agents/skills/<name>/ (current working directory), $REPO_ROOT/.agents/skills/, $HOME/.agents/skills/, or /etc/codex/skills/ for system installs. The frontmatter Codex requires is the same name and description Anthropic requires.
Gemini CLI supports the format too. Google Cloud's writeup says it plainly: skills "offer a portable solution that works in Gemini CLI and Antigravity as well as other coding agents like Claude Code, GitHub Copilot, or Cursor."
Windsurf has Cascade Skills with progressive disclosure parity. Cursor and GitHub Copilot are in the runtime-support list as well. As of early 2026 there are 30+ tools that natively recognize the SKILL.md format. A skill you author for Claude Code can be copied to a Codex .agents/skills/ directory and work without modification. The OpenAI Codex docs confirm this directly.
The pasted-prompt portability bonus
One more anecdotal note. A simple skill (one with no scripts/, no references/, no bundled assets, just the SKILL.md frontmatter and a Markdown body) functions as a reusable prompt. I've taken the body of a few of my simpler skills and pasted them as raw prompts into LLMs that have no native skill runtime, and they produce useful behavior. Not as polished as native runtime, no auto-trigger, no progressive disclosure. But the instructions land and the model follows them.
I haven't seen this benchmarked anywhere. Treat it as my own observation, not a sourced finding. But if you have a "simple" skill (prose-only, no executable code, no separate reference files), the file you authored for Claude Code is also functioning as a reusable prompt artifact you can hand to whatever LLM you happen to be talking to that day. That's a different kind of portability, and it's worth knowing about.
When a Skill Isn't the Answer
Skills are powerful and amazing, but not a master key. Claude Code has a lot of procedures, tools, and functionality that resolve all kinds of problems, and a skill isn't always the right one. Three honest cases where I'd push back on reaching for a skill.
The "scan-codebase-and-preload-context" skill pattern. I see a lot of YouTube demos of skills that scan the user's codebase to load relevant context into a Claude session before the real work starts. Most of these are solving a problem the wrong layer should solve. A well-configured Claude Code project (a properly scoped CLAUDE.md, the right hooks, the right settings.json permissions) handles ambient context loading automatically and deterministically, without needing to fire a skill at all. The right shape for context preload is configuration, not procedure.
The caveat: if you want to deep-dive a specific part of a codebase to surface information relevant to work in a different part, that's a skill-shaped problem. The distinction is whether the context is ambient (every session needs it) or on-demand (this specific investigation needs it). Ambient context belongs in CLAUDE.md or hooks. On-demand investigation belongs in a skill. The Claude Code rule-routing decision tree post walks through this trade-off in depth.
Live data and external integrations. Skills are filesystem-based and (in the API runtime) sandboxed without network access. If your task requires hitting a live API, doing OAuth, or pulling current data from somewhere, you want MCP, not a skill. David Cramer of Sentry puts it well: "If skills teach you to cook, MCP provides the instruments that let you do it." Skills and MCP are complementary, not competing. A skill that reads current production metrics is the wrong tool; an MCP server that exposes those metrics, plus a skill that knows how to interpret them, is the right pair.
The eval-loop is load-bearing, not the interview. An industry study reported by Tessl found that LLM-generated context files (drafted by an LLM with no validation loop) decreased agent task completion by 3% versus a no-context baseline, while developer-written context files improved by 4%. The implication for skill authoring: don't trust the meta-skill interview to produce a measurably better skill on its own. The eval loop is what converts the draft into the improvement. If you use Path B or Path C above, you have to run the eval. Skipping that step lands you on the wrong side of the 3% delta.
A few smaller honest notes. Simon Willison, who calls Skills potentially bigger than MCP, still flags sandboxing and prompt injection as the meaningful concern: "We really need to figure out how best to sandbox these environments such that attacks such as prompt injections are limited to an acceptable amount of damage." Treat third-party skills with the same caution you'd treat a third-party npm package, especially anything that bundles scripts/ you don't read. And there are documented bugs in skill auto-discovery (the user-scoped ~/.claude/skills/ discovery bug being the prominent one) that occasionally bite, so if a skill you installed isn't firing, check the obvious thing first.
None of this is an argument against skills. It's an argument for putting them at the right layer for the right job, which is the same advice I'd give about any tool.
Build the First One
The gap between "I haven't built a skill" and "I have three I rely on" is much smaller than the docs make it look. The first one takes thirty minutes, end to end, if you take Path A and ask Claude to build it for you. The second one takes ten, because by then you know the shape. By the third one, you start noticing tasks you've been doing manually for months that obviously want a skill, and the question shifts from "should I build a skill" to "which skill should I build first."
If you're a first-timer, this is your starter loop:
- Pick one repetitive prompt or workflow you've been pasting for the last month.
- Open any Claude surface and say: "Help me create a SKILL.md for X."
- Drop the file Claude produces into
~/.claude/skills/your-skill-name/SKILL.md(Claude Code) or upload via Customize → Skills (Claude.ai). - Use the skill twice this week. If it fires when you expect it to, you're done.
If you're an optimizer with a few skills already running, the next move is the combined RED-DRAFT-GREEN-REFACTOR-OPTIMIZE-SHIP loop. Run one of your skills through Anthropic's run_loop.py description optimizer with a 20-query trigger eval and see what falls out. If you've never measured trigger accuracy, you'll be surprised by where it lands.
Either way, build the first one. The skills you don't build are the same as the prompts you keep pasting, only quieter about how much they cost you.
If you'd rather have the first skills built around your specific codebase and workflows, the Custom Skill Development engagement is where that happens. For full Claude Code infrastructure (skills plus hooks plus settings, configured for the way your team works), Claude Code Infrastructure Setup covers the whole picture. Or if you'd like to talk through whether skills are the right next step at all, grab a fifteen-minute slot and we'll figure it out together.