About twenty-five minutes into a 90-minute Claude Code training session for 400-plus people at an enterprise SaaS org, an engineering manager dropped a question into chat that I had been waiting for. "If half my team is on 4.6 and half is on 4.7, do I tell them all to upgrade?"

The honest answer, delivered live, was that this is the wrong question. The right question is whether your team's CLAUDE.md, hooks, skills, and shared scripts are safe across both versions. Six days earlier I'd pointed my own opus-compatibility-scanner at my own most-robust Claude Code project, and it surfaced 3 Critical findings plus 4 Warnings I had not known I had. If a setup that careful had drift, every mixed-fleet team has drift. The model upgrade is the smaller half of the problem. The configuration audit is the larger half, and nobody's talking about it.

I counted roughly twelve to fifteen messages in that 400-person chat touching the same shape of question. Every one came from someone running a team. Engineering managers, tech leads, staff-plus engineers fielding their team's tickets. The questions about effort levels and tokenizer cost were coming from a different cluster: people on keyboards, writing the prompts. Different audience, different question. The existing 4.7 commentary almost entirely answers the writing-the-prompts version, and it barely touches the running-the-team version.

What changed between 4.6 and 4.7 that propagates through your team's config

Anthropic shipped Opus 4.7 on April 16, 2026. The migration guide lists nine documented behavior changes, and the companion what's-new page frames them as "not API breaking changes but may require prompt updates" before they are visible. Every one of those changes is also a way that a shared CLAUDE.md, skill, or hook will behave differently on each engineer's machine depending on which model their session resolves to.

The four most consequential, in the order they show up in tickets.

Literal instruction following. Opus 4.7 reads CLAUDE.md and prompts more literally than 4.6. It will not silently generalize from one item to another, and it will not infer requests that were not explicitly made. Anthropic frames this as "may require prompt updates," not as a defect. If your CLAUDE.md was written under 4.6's helpful-inference defaults, 4.7 will execute it strictly and find it incomplete. The effort-ladder post named this directly: "practitioners with looser instructions, who relied on 4.6 to fill in blanks, are likely to feel more friction on 4.7." The mixed-fleet symptom is that two engineers with the identical CLAUDE.md get different output, and the one on 4.7 files a ticket calling it a regression.

Different default effort by model version. As of Claude Code v2.1.117, the default effort level is xhigh on Opus 4.7 and high on Opus 4.6. Anthropic's effort documentation confirms it: xhigh is Opus 4.7-only, and on 4.6 it falls back to high. Anthropic also notes: "We expect effort to be more important for this model than for any prior Opus." A team running an A/B comparison between the two versions, or a CI script that does the same work across both, is comparing different effort calibrations even if no one set anything explicitly. See the Claude Model x Effort Matrix for the full support grid showing where xhigh applies.

Fewer subagents and fewer tool calls by default. Both the what's-new page and the migration guide document this verbatim: "Claude Opus 4.7 tends to spawn fewer subagents by default. However, this behavior is steerable through prompting; give Claude Opus 4.7 explicit guidance around when subagents are desirable." A CLAUDE.md or skill that worked on 4.6 because the model decided on its own that this task warranted three subagents will now spawn one on 4.7 unless you tell it otherwise. The exposure compounds for bundled skills like /batch that fan out across many subagents under one approval. Same prompt. Different output. No error. Silent.

Tone and memory shift. Two changes that read minor on a benchmark but show up immediately in code review. Opus 4.7 is more direct and uses fewer emoji than 4.6's "warmer" style, per Anthropic's model overview, and its file-system memory (scratchpads, notes files, multi-session structured stores) is meaningfully better. For an EM, this shows up as review comments, agent commentary, and committed agent notes that look different across the team. Your code review tooling may flag the inconsistency before you do.

SAME CLAUDE.MD, DIFFERENT BEHAVIORINSTRUCTIONSDEFAULT EFFORTSUBAGENTSTONEOpus 4.6InferentialhighSpawns proactivelyWarmerOpus 4.7LiteralxhighFewer, steerableMore direct

The full enumeration of silent changes, including the file-system memory and progress-update differences, sits in the Opus 4.7 deep dive I wrote when the model shipped. It is the canonical list and worth reading once before the audit. What matters here is that none of these changes is a defect. Each one is documented. Each one was a deliberate Anthropic design decision. And every one of them propagates differently through the same configuration file when it is read by 4.6 versus 4.7.

Three breaking API changes that hard-fail one half of your fleet

The behavioral changes above will produce different output and are not breaking API changes per Anthropic's framing. There are three other changes that ARE on the API surface, separate from the nine behavioral items above, and all documented in the migration guide. Two are hard 400 errors on 4.7 that Opus 4.6 silently accepts. The third is silent in a different way: no error, just a quietly different response.

Sampling parameters removed. Setting temperature, top_p, or top_k to any non-default value on Opus 4.7 returns HTTP 400. Opus 4.6 still accepts these parameters without error. Any shared script, hook, or CI step that still threads these parameters through will silently work for 4.6 users and hard-fail for 4.7 users. The fix: remove the parameters. The detection is harder, because the failure shows up at runtime in the engineer who upgraded, not at code review time.

Extended thinking budget removed. The old shape, thinking: {type: "enabled", budget_tokens: N}, is no longer supported on 4.7 and returns 400. Adaptive thinking on 4.7 must be opted into explicitly with the new field shape. Anthropic's own Python migration example walks the new pattern. If your team has a CLAUDE.md template, an MCP server config, or an agent harness that still configures budget_tokens, that surface is now a 400 error generator on every Opus 4.7 session.

Thinking content omitted by default. This is the one that scares me most because there is no error to catch it. On Opus 4.6, summarized thinking text was returned by default. On Opus 4.7, the thinking field is empty unless the caller opts in. A debugging harness, an evals pipeline, a logging layer that depended on seeing thinking output for any reason will silently stop seeing it. No exception. No 400. Just a quietly different pipeline. The migration guide labels this a "silent change," verbatim.

The Tier-1 source for all three is the same Anthropic migration guide. The engineering teams that absorbed this pre-launch via dedicated release-day reading already updated their shared scripts. The teams that deferred reading the migration guide until "we have a real reason to upgrade" inherited the breakage when their first engineer upgraded and the rest of the team did not.

0 .35x Tokenizer multiplier ceiling on Opus 4.7
0 API-surface changes that break a mixed fleet (2 hard 400, 1 silent)
0 .4 pts BrowseComp regression: 4.7 vs 4.6 (Vellum)
0 Patterns the opus-compatibility-scanner audits

The 1.0x to 1.35x tokenizer multiplier is documented on Anthropic's what's-new page and means the same input string costs more tokens on 4.7. Your shared max_tokens settings, your compaction triggers, and your usage-window math all need re-baselining. The 4.4 percentage-point BrowseComp regression I will come back to in its own section, because it is the one item on this list that no configuration audit can fix.

The hidden mixed-fleet you didn't know you had

Here is the part that surprised me when I dug into the model-config docs for the audit. If your team uses the opus alias instead of pinning a specific version ID, you may already be running a mixed fleet without anyone choosing to.

The opus alias resolves to Opus 4.7 on the Anthropic API and to Opus 4.6 on Bedrock, Vertex AI, and Foundry. Two engineers on the same team, both running claude --model opus, may be hitting two different model versions because their claude-code deployment is wired to different upstream providers. The 4.7 deep-dive I wrote called this a "split, not an upgrade" for a reason that becomes more concrete here: Anthropic shipped 4.7 as a different fleet, not a successor, and the alias system reflects that.

Pinning the version is a three-part move, not a one-line change. The official docs spell it out, and the lever I see EMs running Bedrock or Vertex deployments miss most often is the third one.

THREE-PART FLEET CONTROLavailableModelsRESTRICTSWhich modelsusers can pickin the /modelpickermodelDEFAULTSWhich modelruns by defaultunless user picks"Default"ANTHROPIC_DEFAULT_OPUS_MODELRESOLVESWhat "Default"actually meansfor the opus aliasAll three. The first two without the third is the failure mode.

availableModels is the allowlist. It restricts which models a user can pick from the /model slash command. model is the default model that loads on session start. The third lever, ANTHROPIC_DEFAULT_OPUS_MODEL, is the one EMs miss most often. It is the environment variable that defines what "Default" resolves to in the picker. If a user selects "Default" instead of a named model, the resolution path runs through this variable. Set the first two without the third, and a user who clicks "Default" gets whatever the opus alias resolves to in their deployment, which on Bedrock or Vertex is still 4.6.

There is one more drift mechanic that catches teams running A/B comparisons between the two versions. Switching models with the /model command in Claude Code persists the selection to ~/.claude/settings.json, and as of v2.1.117 also writes to .claude/settings.local.json if the project pins a different model. A community-reported GitHub issue tracks the related effortLevel persistence: switching between Opus and Sonnet via /model can silently overwrite an explicit effortLevel setting because Opus and Sonnet have different default effort calibrations. An engineer who flips to 4.6 to debug something, then flips back to 4.7, may now be running 4.7 at whatever effort their settings persisted. This is not a controlled comparison, and it explains a class of "Claude Code feels different today" tickets that have nothing to do with the model and everything to do with persisted state.

Caution

The /model picker also has a "Default" option that bypasses your model pin. Anthropic's docs are explicit: the model setting is "an initial selection, not enforcement." A user can still open /model and pick "Default," which resolves to the system default for their tier and to whatever the opus alias resolves to in their deployment. On Bedrock or Vertex, the opus alias resolves to 4.6 unless you set ANTHROPIC_DEFAULT_OPUS_MODEL. On the Anthropic API, it resolves to 4.7. Without that third lever, you are not pinned, you are guessing.

The configuration audit checklist for a mixed-fleet team

Once you accept that the version split creates configuration drift on its own, the EM's task changes shape. It is not "schedule a model-update training" or "send a policy memo to the team." Both of those treat this as a knowledge problem. It is a configuration problem, and the audit unit is your repo, not your roster.

Five layers, in the order I run them.

Layer 1: pin the model version explicitly. Set availableModels, model, and ANTHROPIC_DEFAULT_OPUS_MODEL together in your managed settings.json. This is the deterministic enforcement layer. Skip the third lever and you are still running the alias. The decision tree for where this sort of rule belongs (managed vs project vs user settings.json) is the rule-routing decision tree I wrote last week.

Layer 2: audit CLAUDE.md against literal reading. Walk every directive and ask the strict-reader question. Would a model that reads this exactly as written, with no inference, do what you want? If your CLAUDE.md says "follow our coding conventions," 4.7 is going to ask itself "which conventions?" and proceed with whatever it decided was reasonable. Conventions need to be enumerated, linked to a reference file, or moved into a skill. This is the layer that resolves the largest share of "Claude Code is ignoring our rules" tickets in my experience.

Layer 3: scan hooks and shared scripts for the three breaking API changes. temperature, top_p, top_k, and budget_tokens should not appear anywhere in your .claude/ directory or your CI scripts on the 4.7 path. Anything that consumed thinking output for debugging or evals needs the new opt-in shape. The opus-compatibility-scanner I shipped last week automates seventy patterns across this layer; running it once gives you a baseline, and the report names the file and line for each finding so a tech lead can triage in an afternoon.

Layer 4: re-baseline max_tokens and compaction triggers. The 1.0x to 1.35x tokenizer multiplier means the same conversation hits compaction sooner on 4.7 than on 4.6. If your team has CLAUDE.md sections that say "compact at N turns" or skill scripts that calculate context budget, those numbers were calibrated against 4.6's tokenizer and need a fresh measurement. The fix is mechanical, but it has to happen.

Layer 5: encode the standard into deterministic layers, not advisory ones. The directive "always pin the model version" in CLAUDE.md is a request, not a guarantee, on either model. A PreToolUse hook that blocks an agent from invoking a tool with deprecated parameters is a guarantee. A custom skill that wraps the model-config setup as /standardize-claude-code is the move that makes it repeatable across projects. The skills hands-on guide is the implementation path. A well-written skill is also model-version-agnostic, which is the property you want for a mixed fleet that will eventually be 100% on whichever version Anthropic ships next.

The instinct to fix a mixed-fleet drift problem with a "what's new in 4.7" training session is the same instinct that explains why bought AI licenses sit unused for three months. Training does not encode behavior. Workflow does. The layers above are the workflow. The training session is a useful complement, not the primary lever.

If you have not already run a structured infrastructure audit on a Claude Code rollout, this is the engagement shape that fits: I run Claude Code Infrastructure Setup inside the team's own codebase, audit the five layers above, and configure the artifacts (managed settings.json, hooks, shared skills) around how the team already ships.

The one thing the configuration audit can't fix

There is one item on the 4.6-versus-4.7 list that the audit will not solve, and it is worth being honest about. Vellum's independent benchmark analysis reports that Opus 4.7 scored approximately 79.3% on BrowseComp, down from 83.7% on Opus 4.6. That is a 4.4 percentage-point regression on the benchmark that measures multi-step web research performance. Anthropic's own launch materials do not publish this figure.

For most Claude Code workloads, this regression is not load-bearing. Coding tasks, refactors, agentic pipelines that operate on files in a repo: these do not depend on multi-step web research, and 4.7 strictly outperforms 4.6 on them. The configuration-audit framing wins here cleanly.

For the narrow class of agent workloads that depend on multi-source web synthesis and contradiction detection (research agents, market intelligence pipelines, fact-check loops), 4.6 may genuinely be the better model, and no amount of configuration tuning on 4.7 restores the missing capability. This is a training-side regression, not a configuration-debt regression. The honest fleet-management move for that class of workload is to keep those specific agents pinned to 4.6, route them explicitly through model: "claude-opus-4-6", and revisit when Anthropic ships a 4.7-class model that closes the BrowseComp gap.

This is the steel-man of the post. The thesis ("it's a configuration problem, not a model problem") holds for the dominant Claude Code use cases. It does not hold for the small class of research-heavy agentic workloads where the model-side regression is real. An EM who runs both classes should know which is which, and route accordingly.

What I told the EM in chat

Back to the 400-person training session. The EM who asked the version-uniformity question got a different answer than the chat thread expected, and the room quieted for a moment when I said it. The reframe lands the same way every time I've delivered it.

You don't have a model-version problem. You have a configuration problem that the model split made visible. The fix isn't picking a side. The fix is auditing the five layers your team's rules live in, pinning the version explicitly with all three levers, and encoding the parts that have to hold deterministically into hooks and skills. Once that's done, the question of whether the team is on 4.6 or 4.7 becomes a routine upgrade question. Until then, no upgrade decision will feel safe, because the upgrade will keep surfacing drift that was already there.

This is the part the existing 4.7 commentary almost entirely misses. The migration guide is API-scoped. The benchmark analyses are model-scoped. The "should you upgrade" comparisons are practitioner-scoped. The EM seat needs the team-and-config-scoped version, and that is the post you just read.

If your team is in the mixed-fleet position right now and you want a structured pass through the five layers before your next "Claude Code is ignoring our rules" ticket lands, book 15 minutes and we'll look at the actual configuration files together. The opus-compatibility-scanner is free to run yourself if you'd rather start with the audit and bring me the report.

The right answer to "if half my team is on 4.6 and half is on 4.7, do I tell them all to upgrade?" is "not yet, and not for that reason." Audit the config first. The version question gets easier when the configuration question is no longer doing the work.