During a Code Quality day at a roughly 1,000-person SaaS org, I stood in front of 400 engineers, product managers, and ops folks knowing dozens of them had already given AI coding tools a shot and walked away disappointed. Their VSCode Copilot trial back in 2025 produced autocomplete suggestions that were almost right, hallucinated past one-line completions, and couldn't follow context outside the file they were editing. A subset kept it on for boilerplate. Nobody used it as the tool they'd reach for when a sprint ticket landed in their queue.
The conventional read of that room: skeptics, hardest segment, wait them out, lead with enthusiasts.
I think that read is wrong. The team that already got burned is your easiest re-engagement, not your hardest. The failure they remember was tool-and-era specific. Once you name what's structurally different now and put them inside one working agentic session on their own code, they flip faster than the people who never formed a strong opinion in the first place.
Why the conventional re-engagement advice wastes the easiest opportunity
The dominant engineering leadership advice on AI rollouts treats skeptics as deadweight. Pete Hodgson's adoption framework, the standard staged-approach essays, even Faros AI's senior-engineer playbook (Faros AI's senior engineer adoption research, worth qualifying as vendor research, not independent) reach for the same move: build an experimental community of willing volunteers, let the doubters observe until they come around. The framing is empathetic. It's also a waste.
Burned engineers are not blank slates with bad attitudes. They are the most calibrated audience in your org. They tried Copilot. They watched it suggest wrong types in a typed language. They watched it recommend renaming a function that was being called from another repo, then confidently produce code that referenced a library version they didn't have installed. They formed a precise mental model of what AI-assisted coding looked like in 2025. That model is correct for the tool they used. It's wrong for the tool you're now asking them to try, and the gap is structural, not incremental.
Conventional rollout playbook
- Lead with willing enthusiasts; build experimental community.
- Treat skeptics as a high-friction, low-priority segment.
- Wait for skeptics to observe gains and come around.
- Re-pitch benefits when adoption stalls.
Inverted re-engagement playbook
- Lead with the burned cohort; they are the most calibrated audience.
- Diagnose what they tried, when, on what task class.
- Show structural difference, then put them inside one working session on their code.
- Tighten Claude Code infrastructure for the holdouts who hit friction.
Three groups sit inside the burned cohort, and they don't need the same entry point. The Copilot-burned tried autocomplete in VSCode and found it slowed them down on anything past a one-line completion. Stack Overflow's 2025 Developer Survey shows trust in AI accuracy fell from roughly 43% in 2024 to 33% in 2025 even as usage rose to 84%, with experienced developers landing at the most cautious end of every cohort cut (Stack Overflow 2025 AI section). The chat-LLM-burned tried Claude or GPT in a browser tab, copy-pasted snippets, then debugged for an hour when the snippet broke something three files away. The observer-burned never tried it themselves but watched a teammate go through one of the above. Their skepticism is borrowed and firmly held.
None of those three is pure unfamiliarity. Each one is a precise grievance attached to a specific tool generation.
What is structurally different now
The argument "this tool is different" has been made before, often badly. The reason burned engineers don't trust the framing is that the framing is exactly what wrote the check that bounced last time.
Skip the framing. Show the mechanism.
What 2025-era Copilot was: an autocomplete model attached to your cursor. You typed. It suggested. You pressed tab. The artifact it produced was a suggestion you couldn't verify until runtime. The trust loop went accept, ship, debug, curse the tool.
What plan-mode in Claude Code is: an agentic loop where the model reads relevant files in your codebase, runs investigative checks, and presents you a plan before touching anything. You review it. Sometimes you catch a gap or tweak the implementation before approving. Only then does any file change. The artifact it produces is a reviewable proposal, not a guess. The trust loop is review, approve, ship, audit the diff.
The first time I gave Claude Code a complex feature in plan-mode and watched it pull the relevant files, run its own investigative checks, and hand back a plan I could correct before any character changed, I said this out loud to nobody in particular: this isn't just an LLM anymore. It's a junior engineer I can task with feature work. With proper configuration and discipline, it ranges into senior territory.
The capability data backs the experience. Claude's score on SWE-bench Verified, the benchmark for resolving filed GitHub issues against open-source repos, jumped from 49% on Claude 3.5 Sonnet in mid-2024 to 77.2% on Claude Sonnet 4.5 by October 2025 (Anthropic's June 2024 result; InfoQ on the 4.5 score). That's a 28-point jump on a benchmark that isn't a toy. JetBrains' 2026 research on which AI coding tools developers actually use puts Claude Code at 91% CSAT and an NPS of 54 (JetBrains research). Agentic-class tools and autocomplete-class tools land in different satisfaction bands now.
You don't have to argue any of this to a burned engineer. You have to show them the plan-mode loop on their codebase.
The four-step playbook (and the study you have to address)
The sequence that worked at the day-job rollout, in order, with the parts most leaders skip noted:
Clean diagnostic
Ask each burned engineer what they tried, when, on what task. 'How long ago did you try?' is the wedge. Move the conversation from 'AI is unreliable' to 'one specific tool was unreliable on one specific task class in 2025.'
Differentiation training that names the era
Side-by-side: 2025 autocomplete loop vs current plan-mode loop, with the artifact each one produces. This sparks interest. It does not flip anyone yet.
Working session on the engineer's own code
A burned engineer running plan-mode against their own service, watching the model identify the right files, propose a plan that includes the test file they would have edited themselves, and produce a reviewable diff. That is the moment that changes.
Infrastructure tweaks for holdouts
Read their CLAUDE.md, audit their skills, check their hook configuration. Targeted infra changes recover the engineers who did not flip on session alone.
Here is the study you have to address head-on. METR ran a randomized controlled trial in 2025 with 16 experienced developers using Cursor Pro and Claude 3.5 / 3.7 Sonnet on tasks pulled from large open-source repositories. Result: developers were 19% slower on average while believing they were 20% faster (METR's July 2025 study). METR's February 2026 update flagged selection bias in the original sample (METR update), but the perception-reality gap is the part that survives. A skeptical engineer who had a "felt fast, was slower" first wave isn't being unreasonable.
This is exactly where infrastructure discipline earns its keep. The trust-building artifact in current-gen tools isn't raw speed; it's a reviewable plan plus a verifiable diff. You can audit what changed. You can require tests pass before a hook lets a commit through. You can put rules in the layer that holds them deterministically rather than in the layer that suggests them advisory-style (decision tree for CLAUDE.md, settings, skills, and hooks; the Engineering Manager's governance guide covers the three-tier framework that this routing slots into). The METR slowdown describes a population doing agentic work with no enforcement scaffolding around it. The recovery isn't "the model got faster." The recovery is "now you can verify what it did before it ships," paired with the agentic Plan-Audit-Implement-Verify cycle.
Where this playbook stalls
The playbook isn't universal. Three failure modes worth naming up front, with a triage heuristic for each.
The hardened anti-AI engineer. Some skepticism isn't burn-driven; it's principled, sometimes ideological, and demonstration alone won't move it. Diagnosis: ask whether their objection is to the 2025 tool or to the category. If they answer "the category," you're not in re-engagement territory. The recommendation, if you want a chance: don't tell them about the benefits. Show them, on work that benefits them directly. Even then it's slow, and some won't come around until the cohort around them has shifted enough that holding out costs more than the perceived cost of the tool.
Platform debt is the second failure mode. A team running agentic tools without deterministic CI, version-controlled environments, and a code-review culture that catches errors before they ship doesn't just fail to gain from agentic output; it amplifies bad output across more files faster than humans can review. Bain's 2025 analysis of generative AI in software development found two-thirds of firms saw low adoption despite rollouts and three-quarters named change management, not tooling quality, as the hardest part (Bain 2025). Diagnosis: ask how long it takes a junior engineer's PR to land in production. If the answer is "weeks because CI is flaky and reviews stack up," CLAUDE.md routing won't help yet. Sequence matters: the platform-engineering work has to land first.
Change fatigue is the third. If your engineers have rolled through four platform shifts in eighteen months, the fifth one will run into a fatigue ceiling that has nothing to do with AI, even when the tool is structurally better and the rollout sequence is right. Diagnosis: count the migrations and reorgs in the last year. If the number is greater than two, the honest move is to slow down and stabilize what's already in flight before adding a new initiative on top.
None of these break the thesis. They limit it. The burned-by-Copilot team in a healthy platform-engineering org with enough headroom for one more change initiative is the exact population this playbook is built for, and that population is larger than the conventional read suggests.
The leaders who win the second attempt
In the day-job rollout this playbook came from, hundreds of engineers across the org now use Claude Code for all development tasks; very few are still coming around. The path from "dozens burned by Copilot in 2025" to "hundreds productive on agentic work in 2026" was the four-step sequence above, run on the cohort the conventional advice told us to deprioritize.
The team that gave AI coding tools a shot and walked away disappointed isn't lost. They are the most pre-diagnosed segment in your org. They've already done the work of figuring out what bad output looks like, and they don't need to be re-pitched. They need a clean diagnostic, the structural-difference framing, one working session on their own code, and infrastructure discipline for the engineers who hit friction. The leaders who win the second attempt treat their burned engineers as the most calibrated audience in the room.
If you're staring at a second AI rollout and the first one didn't land, the Consultations and Workshops tier is built for exactly this scope: a structured diagnostic plus a working session against your codebase, then a tightening pass on Claude Code infrastructure for the holdouts. Book a 15-minute discovery call to walk through what your re-engagement looks like.