claude-codeworkflowtutorial

The Agentic Development Starter Guide: Plan, Audit, Implement, Verify

April 8, 2026 ·12 min read · Mitchel Lairscey
In this post

What if the reason AI coding agents disappoint you has nothing to do with the AI?

Only 29% of developers now trust AI-generated code to be accurate, down from 40% a year earlier. Not because the tools got worse. They got dramatically better. The problem is how developers use them: type a prompt, accept whatever comes back, debug for an hour when it breaks something three files away.

A randomized controlled trial by METR found that experienced open-source developers were 19% slower when using AI tools. The kicker? Those same developers believed they were 20% faster. The perception gap is almost perfectly inverted.

I've seen this pattern from the other side. Across agentic development workflows that I built and trained teams on at an enterprise org, the difference between teams that saw order-of-magnitude acceleration and teams that got frustrated came down to one thing: workflow discipline. Not smarter prompts. Not better models. A structured cycle of planning, auditing, implementing, and verifying that turns an AI coding agent from an unpredictable autocomplete into a reliable engineering tool.

This guide walks through that cycle. It is Claude Code-first in its examples, but the principles apply to any agentic coding workflow.

What Is an Agentic Coding Workflow?

An agentic coding workflow is a structured process for using AI coding agents to build software, where the developer directs the work through planning, review, and verification instead of typing prompts and hoping for the best.

That second approach has a name: vibe coding. You describe what you want in natural language, accept the output, and iterate by feel. It works for prototypes, throwaway scripts, and exploration. It falls apart the moment you need the code to run in production, pass review, or survive contact with a real codebase.

The distinction matters because the tooling has gotten good enough to blur it. Claude Code, Cursor, Windsurf, Copilot agent mode: they can all produce working code from a loose prompt. The question is whether it is the right code. Correct for your architecture, consistent with your standards, and safe to ship without a week of cleanup.

Vibe Coding Agentic Coding DEVELOPER ROLE Prompt writer Architect + reviewer OUTPUT QUALITY Works (maybe) Correct, reviewable, shippable BEST FOR Prototypes, scripts, exploration Production code, team codebases FAILURE MODE Silent bugs, wrong approach Slower start, higher floor

Spec-driven development (SDD) has emerged as the formal methodology behind this shift. AWS Kiro, GitHub spec-kit, Zencoder. Multiple vendors now build around the same idea: write the spec before you write the code. But you don't need a vendor tool to adopt the discipline. What follows is the workflow I use every day with Claude Code.

The Plan-Audit-Implement-Verify Cycle

The official Claude Code best practices describe a four-phase workflow: Explore, Plan, Implement, Commit. That's a good starting point. But it leaves out the step where most of the value lives: auditing the plan before you implement it.

The full cycle:

PHASE 1 Explore Read-only. Understand the codebase first. PHASE 2 Plan + Audit Generate plan. Review it. Revise until solid. Re-audit? PHASE 3 Implement One task at a time. Commit per task. PHASE 4 Verify Tests, build, fresh review session. Rule: agents advance to the next phase only when deterministic gates pass.

Phase 1: Explore. Before writing any code or even asking for a plan, understand what you are working with. In Claude Code, use Plan Mode (Shift+Tab) to read files, trace dependencies, and understand the architecture. This is read-only. No edits. The goal is context, not action.

Phase 2: Plan + Audit. Ask Claude to generate a detailed implementation plan. Then audit it (more on this in the next section). Revise. Re-audit if needed. This is where teams succeeding with agentic coding invest 70% of their time: problem definition, not code generation.

Phase 3: Implement. Switch to Normal Mode and implement against the approved plan. One bounded task per prompt. Commit after each task so you have rollback points. Feed verification criteria into each prompt so Claude can check its own work.

Phase 4: Verify. Run your test suite. Run the build. Then open a fresh Claude session and have it review the implementation against the original plan. The Claude Code docs call this the Writer/Reviewer pattern: a separate session reviews with fresh context, avoiding the bias of reviewing code it just wrote.

When to skip the plan

Not every change needs the full cycle. The Claude Code docs put it well: "If you could describe the diff in one sentence, skip the plan." A typo fix, a one-line config change, a rename. Just do it.

Planning is most valuable when:

  • The change touches multiple files
  • You are unfamiliar with the code you are modifying
  • The approach is not obvious
  • Getting it wrong would be expensive to undo

For everything else, a quick prompt in Normal Mode is fine. The discipline is knowing which mode the task requires, not applying the heaviest process to every change.

How to Audit a Plan (and When to Stop)

The first plan an agent produces is almost never ready to implement. It's a draft. Treat it like one.

When I review an agent-generated plan, I check five things:

  1. Completeness. Does it address every requirement? If I asked for a feature with error handling, is error handling in the plan or just the happy path?
  2. Correctness. Is the approach right for this codebase? Does it use the existing patterns, or is it inventing new abstractions that clash with what's already there?
  3. Scope. Is it doing too much? Agents love to over-engineer. If the plan introduces a new utility class for something used once, that's a flag.
  4. Gaps. What is missing? Common omissions: migration paths for data changes, rollback strategies, performance implications of new queries, edge cases around empty or null states.
  5. Predictability. Can I predict what files will change and roughly what the diffs will look like? If I can't, the plan isn't specific enough.

How many audit rounds?

One to two rounds is typical. In practice:

  • Round 1: Read the plan. Identify gaps. Ask Claude to revise with specific feedback ("You missed error handling for the API timeout case" or "This should use the existing UserService instead of creating a new one").
  • Round 2: Read the revised plan. Confirm gaps are addressed. If it's solid, approve and move to implementation.

Three or more rounds signals one of two things: the task is too large and needs decomposition, or you aren't giving the agent enough context. If you find yourself on round four, stop. Break the task into smaller pieces. Each piece should be small enough that the plan is obvious after one pass.

Tip

A good heuristic: if the plan fits in your head (if you can mentally trace the implementation from start to finish) it is ready. If you can't hold the whole thing in your working memory, the task is too big for a single agent session.

Surfacing gaps the agent will not find on its own

Agents are good at generating plans that are internally consistent. They are bad at catching what is missing. Three questions help surface gaps:

  • "What does this break?" Ask Claude explicitly. It will often identify downstream effects it did not address in the initial plan.
  • "What happens when this fails?" Error paths, timeouts, partial failures. Agents default to optimistic paths unless you force pessimistic ones.
  • "What existing code does this duplicate?" Agents lack the institutional memory to know that utils/dateFormat.ts already does what they are about to reinvent. Your CLAUDE.md file and custom skills help, but direct questions are faster.

Implementing the Plan Without Losing the Thread

With an approved plan, implementation becomes mechanical. That's the point. You've already made the hard decisions about architecture, approach, and scope during the audit phase. Now you're directing execution.

Three rules keep implementation clean:

One task, one prompt. Break the plan into atomic tasks. Each task should produce a change you can commit independently. "Add the new API endpoint" is a task. "Add the endpoint, update the frontend, write tests, and update the docs" is four tasks pretending to be one.

Commit per task. Every completed task gets its own commit. This is not just good git hygiene. It creates rollback points. If task four breaks something that task two introduced, you can bisect instead of untangling a single massive diff.

Feed verification criteria into each prompt. Do not just say "implement step 3 of the plan." Say "implement step 3 of the plan, then run the existing test suite and confirm nothing breaks." The Claude Code docs call this "the single most impactful thing you can do": give the agent a way to verify its own work. Tests, type checks, build commands. Anything deterministic.

When you are working on a larger feature, keep the plan visible. Paste the relevant section into your prompt, or reference it explicitly. Agents do not have perfect memory across long sessions (context windows have limits), and drift is how you end up with an implementation that technically works but does not match what you approved.

The Writer/Reviewer pattern

For anything non-trivial, I use two Claude sessions. One writes. One reviews.

The writing session implements the plan, task by task. When it is done, I open a fresh session and ask it to review the changes against the original plan. The fresh session has no attachment to the code it is looking at. It didn't write it. It'll catch things the writing session glossed over: inconsistencies with the plan, missed edge cases, style violations.

This isn't paranoia. It's the same logic behind code review between humans. The person who wrote the code is the worst person to review it.

Verification Gates That Catch What You Miss

Verification isn't a single step at the end. It's a gate between every phase transition.

Gate 1: Plan quality. Before implementation starts, the plan must pass your audit checklist. (Already covered above.)

Gate 2: Per-task verification. After each implementation task, the agent should run tests and confirm the build passes. This is the deterministic check that catches regressions immediately, not three tasks later.

Gate 3: Full-suite verification. After all tasks are complete, run the full test suite, linter, type checker, and build. In CI terms: formatting, lint, typecheck, unit tests, secret scanning as preflight. Integration tests, SAST, and dependency scanning on the PR.

Gate 4: Fresh-eye review. The Writer/Reviewer pattern. A separate session (or a human reviewer, or both) reviews the complete changeset against the plan.

The principle from the Tweag Agentic Coding Handbook is worth repeating: deterministic validation gates should sit between every phase. If the gate doesn't pass, the agent doesn't advance.

Why this matters more than it used to: CodeScene's research shows AI coding assistants increase defect risk by over 30% in codebases with low code health scores. The speed that agents give you is real. But it compounds bugs just as fast as it compounds features if you don't have gates in place.

Important

If you aren't writing more tests now than before using agents, you're probably shipping more bugs. The speed increase from agentic coding should come with a proportional increase in test coverage. TDD is not optional in this workflow. It is the primary quality mechanism.

Setting up your project for agentic success

The cycle works best when your project is configured to support it. Three things make the biggest difference:

  1. A CLAUDE.md file at your project root. This is your agent's persistent context: build commands, architecture notes, style rules, and common pitfalls. Every session reads it.
  2. MCP servers connecting Claude to your project management tools. When the agent can read your Jira tickets and GitHub issues directly, it generates better plans because it has better context. Setup guide here.
  3. A test suite that runs fast. If your tests take 10 minutes, the agent can't use them as a verification gate within the session. Fast tests (under 30 seconds for the relevant subset) make per-task verification practical.

What the Research Says (and What It Misses)

I want to address the METR study directly, because it is the strongest counterargument to everything in this guide.

Experienced open-source developers were 19% slower with AI tools. That's a real finding from a real randomized controlled trial. It deserves honest engagement, not dismissal.

But context matters. The study tested developers who already knew their codebases deeply: repositories they maintained, averaging over a million lines of code and 22,000+ GitHub stars. These are developers at the top of the familiarity curve, where the marginal value of AI assistance is lowest. They were also using early-2025 tools, and METR's February 2026 follow-up acknowledged selection effects in the study design.

The broader data tells a different story for different contexts. Controlled experiments show 30-55% speed improvements for scoped, well-defined tasks. The teams I have worked with, using the structured workflow described in this guide on codebases they were building or learning, saw dramatically larger gains.

The takeaway isn't "AI makes you slower" or "AI makes you faster." It's that the workflow determines the outcome. Unstructured AI usage on a familiar codebase adds overhead. Structured agentic workflows on new or growing codebases, with proper planning and verification gates, are where the productivity data from the state of AI-assisted development in 2026 starts to make sense.

Start Here

Pick one task tomorrow. Something that touches 3-4 files, isn't trivial but isn't a full rewrite either. Run the full cycle:

  1. Explore the relevant code in Plan Mode. Read files. Trace the flow.
  2. Plan by asking Claude to generate an implementation plan. Read it critically.
  3. Audit using the five-point checklist above. Revise once.
  4. Implement one task at a time, committing after each. Include verification criteria in your prompts.
  5. Verify by running your tests, then opening a fresh session to review against the plan.

That's the whole thing. No framework to install. No vendor tool to configure. Just a structured way of working that turns an AI coding agent into a tool you can trust.

The discipline is the differentiator. Not the model, not the prompt, not the IDE. Teams that treat agentic development as a workflow problem instead of a tooling problem are the ones that ship faster without shipping bugs.

If you are ready to go deeper, encoding your team's standards into custom Claude Code skills, connecting your tools via MCP servers, or building agentic workflows specific to your codebase, I help teams do exactly that.


Want to talk about how this applies to your team?

Book a Discovery Call

Not ready for a call? Grab the Claude Adoption Checklist instead.

Keep reading