I was testing an early alpha of a one-way migration tool when the question that ended up reshaping the project landed mid-test. The tool did one thing: read a Claude Code project's configuration, find the patterns that broke on Opus 4.7, propose fixes. Premise was simple. The engineers upgrading to 4.7 weren't going to read the migration guide line by line, and a skill that ran the audit for them would land. I stopped the test and started re-planning immediately.
The question that landed: what if I want 4.7 most of the time but the option to switch back to 4.6 for token-heavy work? 4.7 uses more tokens. Anyone with long-context workloads will keep 4.6 in rotation for the heavy jobs. A migration-only tool that proposed 4.7-only changes would break the route back. The skill needed to be bidirectional, not one-way. Compatibility, not migration.
That pivot turned the project into the opus-compatibility-scanner, a free Claude Code skill that audits your CLAUDE.md, your AGENTS.md, your settings.json, your hooks, and the Anthropic SDK call sites in your code against 70 patterns where 4.6 idioms mis-fire on 4.7. Critical findings cite Anthropic-official documentation. Every fix waits for your per-finding approval. I ran the finished scanner against my own most-robust Claude Code setup before publishing. Three Criticals and four Warnings came back.
All three started as someone else's bug report. All three came back as findings on my own setup.
opus-compatibility-scanner
A Claude Code skill that audits your project against 70 Opus 4.6 / 4.7 compatibility patterns. MIT-licensed.
Download (.zip) →This is the build story and the audit it produced when I ran the scanner on myself. Most regressions blamed on Opus 4.7 are configuration debt, not model problems, and they hide across five files Anthropic's migration guide doesn't audit. The rest of this post is what I built, why I pivoted, what showed up when I scanned my own infrastructure, and how to install and run the scanner on yours.
What the cutover breaks
Six behaviors change for Claude Code when you move from Opus 4.6 to 4.7. None of them throw errors. No CLI warnings, no failed tool calls, no obvious tells. The first signal you get is usually a session producing output that surprises you, and the cause sits in your CLAUDE.md, your subagent definitions, your hooks, or your skills.
| Behavior | On 4.7 | Impact |
|---|---|---|
| CLAUDE.md interpretation | More literal | Inferred scope is gone |
| Subagents spawned | Fewer by default | Steerable through prompting |
| Tool calls | Fewer by default | Reasoning fills the gap |
| File-system memory | Improved | Scratchpads carry across turns |
| Tokenizer | +0 to 35% tokens | 1M holds ~555k vs ~750k English words |
| Effort calibration | Stricter at low and medium | "Low" means low |
Two of those rows deserve more than a glance. The literal-instruction-following row is the one that breaks the most setups. Directives that worked on 4.6 because the model inferred your intent now apply to the letter. Subagent files that omitted optional fields because 4.6 filled in the gaps run with whatever you wrote, and only what you wrote.
More literal instruction following, particularly at lower effort levels. The model will not silently generalize an instruction from one item to another, and will not infer requests you didn't make.
The tokenizer row matters less for breakage and more for budget. Anthropic's release post puts it at "1.0x to 1.35x as many tokens" for the same input on 4.7. The models-overview tooltip encodes the consequence: a 1M token Opus 4.7 context window holds about 555,000 English words, where the same 1M on Opus 4.6 held about 750,000. A 26 percent reduction in effective English-word capacity at the same budget. If your CLAUDE.md was already pushing the context ceiling on 4.6, it's now closer to the cliff on 4.7.
Two open issues on the Claude Code repository, #17530 on Claude not reading CLAUDE.md, and #18660 on CLAUDE.md instructions being read but not reliably followed, describe the same underlying complaint from different angles. The model isn't broken. The directives were calibrated for a model that filled in the gaps, and the new model doesn't fill them in.
The five files where the debt hides are the configuration layers a Claude Code session reads on every turn.
CLAUDE.md is advisory; the model reads it and tries to follow it. Hooks and SDK call sites are deterministic; they fire every time or they 400. Two of those layers (CLAUDE.md, SKILL.md bodies) carry the literal-reading risk most heavily, because they are also the layers that grow the fastest as a project ages. Anthropic's best-practices page is direct about it: "Bloated CLAUDE.md files cause Claude to ignore your actual instructions!"
The empirical floor under that warning is sharper than it looks. The AGENTIF benchmark evaluated 707 real-world agentic instructions averaging 1,723 words across all major frontier models. The paper's headline result: when instruction length exceeds 6,000 words, instruction-satisfaction scores drop to near zero across every model tested. Even at moderate lengths, the best overall instruction-satisfaction rate any model achieved across the benchmark was 27.2 percent. That's a ceiling, not a floor. Long CLAUDE.md files were never a great idea on 4.6, and they're a quantitatively worse idea on 4.7 because the model that does land your rules now reads them strictly.
Anthropic's April 23 postmortem is direct about the broader pattern: most of what users report as "the model got worse" is configuration changes interacting badly with new behavior, not weights. A reasoning-effort default changed from high to medium. A system-prompt verbosity rule that capped output at 25 words between tool calls caused a 3 percent quality drop on both 4.6 and 4.7. Anthropic's own admission, verbatim: "This isn't the experience users should expect from Claude Code."
If your project already followed Anthropic's best practices closely, you'll read this section and conclude the scanner is overkill. That's a fair read. If your CLAUDE.md was always tightly scoped, your skill bodies were always tightly scoped, your settings.json had no deprecated parameters, and your SDK calls stayed off temperature, top_p, and top_k, you may have zero findings. The scanner is most valuable for the rest of us. The CLAUDE.md that grew organically over six months. The SKILL.md you copied from a tutorial and adapted. The settings.json with a year of accumulated rules. Those are where the scanner earns its install.
The pivot mid-build
Back to that alpha test for a minute. The version I was running expected one direction of travel: 4.6 to 4.7, audit complete, fix the breaks, move on. Most of the patterns I had cataloged were 4.7-only optimizations. The migration mode was hard-coded into the report shape, the apply paths, the way severity got assigned. Then the question landed: what if I want to keep 4.6 in rotation? The answer broke half the assumptions in the codebase.
A team running both models needs something different: findings that hold on 4.6 and 4.7 together, a mode that suppresses 4.7-only optimizations while 4.6 is still in scope, and an audit that won't propose a fix that breaks the route back. None of that was in the alpha. I shut down the test, opened a fresh planning doc, and re-shaped the project into compatibility mode by default with 4.7-only as an opt-in.
How should the scanner apply fixes? I was on the fence. Batch apply lets a user clear a forty-finding report in one accept. Per-finding apply makes them sit through every fix. Batch is faster, per-finding is safer. The deciding factor was something I'd watched happen in my own day-job code reviews: in a batch, things slip through cracks. A wrong fix lands alongside the right ones, and nobody notices because the diff is too long. The scanner is supposed to surface findings the user hasn't thought about; if the apply path is also batched, the user never thinks about most of them. Per-finding won. No batch-apply flags, no --force, no auto-rewrite mode.
That left contradiction detection. The regex-only version was finding "contradictions" everywhere. A directive that said "Ask First when in doubt" and another that said "Depth Over Economy when researching" looked like a conflict to a string match, but the precedence was clear from the bullet structure surrounding them. A heuristic that flagged that pair as Critical destroyed trust in the rest of the report. The fix was a stage-two LLM-judgment pass: if the regex flags a candidate contradiction and the surrounding ±2 lines have no when / if / unless scope keywords, ask a loaded Claude session whether the bullet structure disambiguates the rules. Default verdict on uncertainty: Warning, not Critical. Better to surface a softer finding the user can verify than to misclassify and burn the report.
The argument against this stance is reasonable. Anthropic's own best-practices page already tells engineers to prune CLAUDE.md aggressively, and the scanner's "ambiguous directive" findings duplicate that advice. The counter is that automation matters at fleet scale. Reading "your CLAUDE.md is probably too long" in a docs page is one thing; running a scan that flags the specific lines that are too long, with a Tier 1 citation to Anthropic on each one, is another. The scanner is operationalization, not novel insight. That's the value, and it's enough.
I scanned myself first
My own Claude Code infrastructure already felt hardened against the 4.6-to-4.7 split. I'd published a dedicated post on the split ten days earlier. Production had been running 4.7 since launch. My CLAUDE.md was, by most standards, tighter than the average. The scanner was for other people.
Then I pointed it at my own most-robust project.
My other Claude Code projects, when I scanned them next, came back with zero or one Critical and a couple of Warnings each. The most-robust setup carried the most issues. That surprised me until it didn't. A project with a thoughtful CLAUDE.md, custom hooks, four skills, and an AGENTS.md has more surface area for configuration debt than a thin one. More files, more patterns, more 4.6-era idioms. The cost of curation is the surface area it creates.
The most relatable one was a one-line rule in my root CLAUDE.md banning the creation of unsolicited documentation files. The kind of rule a lot of CLAUDE.md files inherit from Anthropic's own default Claude Code system prompt. On 4.6 the model inferred this meant SUMMARY.md and NOTES.md style summary docs and naturally allowed test files, new components, and migrations as part of completing the task. On 4.7 the rule applies literally, including to the test file I'd just asked the model to write.
I was a little surprised. I used the explain mechanism, read the linked Anthropic docs, and let the skill make the adjustment.
That sequence (surprise → explain → read citation → accept) is the per-finding flow I was on the fence about during the build. Walking through it on my own project, on a finding I cared about, was the moment the design choice paid off. If the scanner had batch-applied my report, I would have accepted all three Criticals and four Warnings in one keystroke and never read a single Anthropic citation. The slower path is the one that taught me what 4.7 does to a rule I had written and forgotten about.
The scanner caught two more findings worth describing. A contradiction pair, flagged as a Warning: a precedence rule for "Ask First" decisions and another for "Depth Over Economy" research, scope-differentiated through bullet structure and not through inline when / if keywords. On 4.6 the layout did the disambiguating work for me; on 4.7 the rules resolved session to session, sometimes one way, sometimes the other. The Critical that hurt most was a deprecated thinking shape in one of my subagent definitions: thinking: {type: "enabled", budget_tokens: N} was the 4.6 idiom; on 4.7 the same shape returns 400. The agent had been failing silently in some sessions and I'd written the failures off as token-cost variance.
What every Critical finding had in common was the Tier 1 citation attached. Each one linked to a specific page on platform.claude.com or code.claude.com where Anthropic documented the change. That mattered more than I expected. On a Critical without a citation, my instinct would have been to argue with the report. On a Critical with the linked Anthropic page, the argument was already settled before I opened the file.
The deeper reason the citation gate works is that default-prompted LLM reviewers distribute findings across whatever severity vocabulary they are handed, regardless of absolute severity. A scanner without an external anchor would have produced three Criticals on the sparsest project I scanned, not zero. The Tier 1 rule prevents that by making a Critical impossible to issue without a verifiable vendor source, which is what kept the sparse-project reports sparse. I unpacked that mechanism (and four other patterns that flatten the curve) in Claude grades severity on a curve.
Here's what one of those drill-downs looks like applied to a real CLAUDE.md excerpt. The top block is what the scanner sees. The strip shows the findings anchored to specific lines. The bottom block is the rewrite the scanner suggests, with the same lines highlighted.
## File output
Never create documentation files (*.md) or README files unless explicitly requested.
## Decision-making
Ask First when in doubt. Depth Over Economy when researching.
## Workflow
After every 5 tool calls, summarize what you've done so far.Always run tests after making code changes.## File output
Don't create new SUMMARY.md / NOTES.md / CHANGES.md unless I ask. Source files, tests, migrations, and configs needed for the task are allowed.
## Decision-making
PRECEDENCE: Ask First wins for user-facing decisions; Depth Over Economy wins for internal research the user has already approved.
## Workflow
(Removed: tool-call-summary scaffolding. 4.7 emits fewer tool calls by default and progress updates are calibrated to the model's pacing.)Always run tests after making code changes.I thought my configurations were safe. I just hadn't run into those issues yet. And now I won't.
That's the kind of result the scanner is built to surface: the issues your configuration carries that haven't broken yet, but will. The next session that resolves the contradiction pair the wrong way is a debugging session you don't want. The next subagent run that silently disables reasoning is a quality regression that's hard to attribute. Catching them before they fire is the entire point.
What it looks like in your hands
opus-compatibility-scanner
Install in 60 seconds. MIT-licensed. No signup, no telemetry.
Download (.zip) →The scanner installs as a standard Claude Code skill. Three ways to get it. Pick what fits your workflow.
Option 1: Download from readysolutions.ai (recommended)
Grab opus-compatibility-scanner.zip from readysolutions.ai/downloads/opus-compatibility-scanner.zip, then extract:
# Project-scoped (this repo only):unzip opus-compatibility-scanner.zip -d .claude/skills/
# User-global (all your projects):unzip opus-compatibility-scanner.zip -d ~/.claude/skills/Option 2: Clone the repo directly into your skills folder
# Project-scoped:git clone https://github.com/readysolutionsai/opus-compatibility-scanner.git .claude/skills/opus-compatibility-scanner
# User-global:git clone https://github.com/readysolutionsai/opus-compatibility-scanner.git ~/.claude/skills/opus-compatibility-scannerOption 3: GitHub release
The same archive is published on the GitHub releases page if you want version history and release notes. Download the .zip and extract it by hand.
After install, start a new Claude Code session. The skill registers automatically. There are no dependencies, no API keys, and no environment variables to set.
One scope note before you run it. This is a Claude Code skill, built around Claude Code's invocation model: the /opus-compatibility-scanner slash command and the native Read, Grep, Bash, and Edit tools running against your working tree. It is not built for claude.ai, Claude Desktop, the mobile apps, or the Claude API, where skills run in a hosted runtime rather than directly against your project checkout. Run it in Claude Code.
You invoke it one of two ways. The slash command /opus-compatibility-scanner triggers it directly. Plain English works too: "Scan my project for Opus 4.6 / 4.7 compatibility issues." The skill description is tuned to fire on natural-language phrasings of the audit intent, including post-upgrade symptom descriptions like "my agent is acting weird since I switched to 4.7."
The first thing the scanner does is ask you which mode you want.
Compatibility is the default. It never proposes a change that would break the other model. If your project still routes some calls to Opus 4.6, this is the mode that holds back 4.7-only optimizations. The report ends with a count of suppressed optimizations so you know they exist.
4.7-only is for projects that have fully migrated. Same patterns, same severity rules, but the suppressions lift. The optimizations that compatibility mode held back become eligible findings.
Diagnose flips the order of severity-class presentation. Instead of leading with Critical, it leads with patterns most associated with the symptom you described. The right mode if a behavior changed after upgrade and you want to find the configuration that explains it.
After mode selection, the scanner enumerates the configuration files in your project and runs the catalogue against them. The drill-down flow is the per-finding loop I built around the design conviction from earlier: yes, skip, or explain. yes triggers a drift-check (the scanner re-reads ±3 lines around the matched line before issuing the Edit, in case you changed the file mid-scan), then applies the rewrite, then shows you the diff before moving on. skip records the skip and moves on. explain expands the rationale: longer "why 4.6 was fine, here's how 4.7 mis-reads, here's what the rewrite preserves" text, plus the full Anthropic citation. The explain path costs an extra turn but never costs a wrong edit.
End of scan, the scanner offers two opt-in artifacts. A JSON manifest at .claude/opus-4.7-migration-manifest.json containing every finding with file path, line number, severity, pattern ID, source tier, and apply-tier classification. A human-readable Markdown report at .claude/opus-4.7-migration-report.md with the same content rendered for review. Both default to "no" on the save prompt. The scanner writes nothing to disk unless you ask. If you're running this on a fleet of projects, the manifest is the integration point; the sub-agent orchestration patterns post covers the dispatch shape if you want to wire it up.
What I cut, and what I'm still wrong about
A few honest limits, ordered from most likely to bite you to least.
The pre-ship moment that hurt was version 3.2.2. I was so close to done. The first end-to-end run produced false positives on the scanner's own pattern documentation, because the catalogue files contain the directive shapes the scanner is built to flag. Of course they do. The scanner was reading its own teaching examples as if they were rules to enforce. The fix was a self-exclusion path that suppresses scans against the scanner's own files. Obvious in retrospect. Annoying to discover at v3.2.2 when I thought the build was done.
The pattern catalogue is dated, not permanent. It captures Anthropic's API and behavior as of 2026-04-26. If Anthropic ships a new model, deprecates a parameter, or changes a behavioral default, the catalogue lags until I (or someone) updates it. Re-run the scanner after major Anthropic releases. Don't treat a clean scan from six months ago as proof your project is still clean.
The catalogue currency caveat is the limitation that bothers me most. Anthropic's release cadence on Opus is roughly 90 to 120 days between major versions, and each release has shifted at least one parameter or default in a way the scanner would need to learn. Building against an evolving API is what every model-specific tool signs up for. The mitigation is to ship updates against the catalogue when something material shifts. The failure mode is what happens if I (or future maintainers) lose the appetite for that work. Treat the catalogue's data_current_through field in the manifest as an expiration marker, not a watermark.
The semantic-judgment gate that suppresses false positives on prose patterns is non-deterministic. The same CLAUDE.md scanned twice on different days might produce slightly different verdicts on the same edge-case pattern. The bias is toward Warning rather than Critical when judgment is uncertain, which bounds the consequences. If you treat the scanner as a deterministic tool the way you'd treat ESLint, you'll occasionally see disagreement between runs. That's a known property, not a bug.
There's also an abstraction-layer path I deliberately didn't build for. If your team routes Claude calls through an abstraction layer (LiteLLM, the AI SDK, or something custom that normalizes parameters across providers), the layer's maintainer absorbs most of the API-shape changes for you. The temperature removal, the budget_tokens deprecation, and the prefill 400 are all things a competent abstraction layer can hide. Operators who pin to dated model snapshot IDs (rather than the floating claude-opus-4-7 alias) get the same effect from a different angle: their model doesn't change unless they change it. The scanner is most valuable for engineers who write directly against the Anthropic SDK and route to floating model aliases.
What I'm thinking about next is a catalogue-update channel that lets the scanner fetch new patterns without a fresh skill download, and a CI-friendly mode that returns structured output for fail-the-build checks. Neither is in v3.2.2. Both feel obvious once you've used the current version against a real project. The MIT license on the skill means anyone can fork and ship those features ahead of me; if you do, please open a discussion on GitHub. Found a bug or false positive in the current version? Open an issue.
Conclusion
Most regressions blamed on Opus 4.7 are configuration debt. The audit is shorter than the migration. If your team is running both 4.6 and 4.7, or recently moved to 4.7 and noticed something off, this is the audit I'd run before debugging the model. If the scan surfaces findings you'd rather not work through alone, Claude Code Infrastructure Setup is the engagement where I configure your CLAUDE.md, hooks, skills, and subagent definitions around how your team ships. The AI Persona Profiler case study gives a concrete sense of the kind of Claude Code skill and hook engineering I do day to day.
opus-compatibility-scanner
Audit your Claude Code project against 70 Opus 4.6 / 4.7 compatibility patterns. MIT-licensed. No signup.
Download (.zip) →