Opus 4.7 vs 4.6 in Claude Code: Where the Upgrade Shows Up, and When xhigh Is Worth the Tokens

For the first few days on Claude Opus 4.7 in Claude Code, I ran the new xhigh effort level across my normal workflows. Effort levels control how much Claude thinks before it acts; Claude Code had promoted xhigh to the default when the Opus 4.7 upgrade landed, and Anthropic recommends it as the starting point for coding and agentic work. Day after day, I watched the token counter drain for only incremental improvements over what I was already getting at high on Opus 4.6. The exception, and it matters, was complex feature work where decisions in one module rippled through several others. On those sessions, xhigh clearly outran high. Everywhere else, I went back to high, and I have been running high on most agentic work ever since.

That is a small decision, not a hot take. The point of this post is why it was the right one for me, why I think it is the right one for most Claude Code practitioners, and what else about Opus 4.7 is worth knowing before you touch the effort dial at all. The 4.6-to-4.7 upgrade shows up at every effort level: sharper reasoning, a more direct and less validation-forward tone, better file-system memory, and a more literal reading of whatever sits in your CLAUDE.md. These are first-person observations from daily use since the upgrade landed, not benchmark claims, and you can feel every one of them without ever leaving high.

What changed in Opus 4.7 for Claude Code users

The feature drop reads differently inside Claude Code than it does at the API level.

Claude Code v2.1.111 shipped on the same day as the model, April 16, 2026, and it made two Claude-Code-specific choices. First, the v2.1.111 release notes added the new xhigh level to the effort ladder. Second, Anthropic's best-practices post confirms Claude Code now sets xhigh as the default effort when you are running Opus 4.7.

You have four ways to change the effort level. The quickest is /effort inside a session: it opens an interactive slider you move with arrow keys, so you can dial up or down mid-task without restarting. Pass --effort <level> at session start for scripts and one-shot jobs. Set a persistent preference in ~/.claude/settings.json to lock a default for your user. Or export CLAUDE_CODE_EFFORT_LEVEL (set to a level like high or max) as an environment variable if you are running Claude Code from a shell that configures it. Selections persist across restarts as of v2.1.117, so whatever you pick stays put. For the rest of your session configuration (recaps, focus mode, and other preferences), /config is the general settings surface, and /model switches between Opus 4.7 and Opus 4.6 mid-session if you want to compare them on the same task.

A few things do not apply inside Claude Code, or apply differently. task_budget, the beta primitive that gives Opus 4.7 an advisory token countdown across a loop, is explicitly not supported on Claude Code or Cowork at launch. The /effort command is your primary lever for pacing token spend. Everything else you read about Opus 4.7 behavior lands inside the CLI: fewer tool calls by default, fewer subagents spawned unless you ask, more literal instruction following, more regular progress updates on long runs, and a more direct tone with less validation-forward phrasing than 4.6. That last one is Anthropic's own language. What practitioners tend to call "less sycophancy" is the same change seen from the user's seat.

The five Claude Code effort levels, explained

Anthropic's effort documentation describes each rung in its own words. I have added a one-line practitioner translation next to each so you can read the ladder from both angles.

Level	Anthropic's guidance	What it feels like in Claude Code
`low`	Efficient, but best for short, scoped tasks.	Literal to the point of terse. Good for rename-a-variable, lint-this-function, quick lookups.
`medium`	"The drop-in for the average workflow where you want good results while reducing costs."	Noticeably cheaper than `high`. Respect what you asked for. Will not go beyond it.
`high`	"Advanced use cases... often the sweet spot balancing quality and token efficiency."	Where I live for most agentic work. The 4.7 upgrade over 4.6 is already visible here.
`xhigh`	"The recommended starting point for coding and agentic work... meaningfully higher token usage than `high`."	Claude Code's default on 4.7. Thinks longer before touching anything. Expensive.
`max`	"Reserve for genuinely frontier problems... can lead to overthinking on structured-output tasks."	I have not found a practitioner use case for this yet. Anthropic's own guidance is to treat it as a reserve.

Two things worth flagging. First, Anthropic says effort is more important on 4.7 than on any prior Opus. That matches what I see. Second, the migration guide notes that 4.7 respects effort strictly, especially at the low end. Code that worked fine at low on 4.6 may under-think on 4.7 because the model now treats "low" as "do exactly what was asked, no more." If you step down, step down on purpose.

Why I reverted from xhigh to high

The token math is the first reason. Opus 4.7 ships with a new tokenizer, and Anthropic publishes the range honestly: expect roughly 1.0x to 1.35x as many tokens on the same input compared to prior models. Practitioners measuring it in the wild have landed at the top of that range. Simon Willison clocked 1.46x on a standard system prompt. Abhishek Ray measured 1.21x on code diffs, 1.37x on user prompts, and 1.45x on CLAUDE.md files, with a weighted average of 1.33x across Claude Code-typical content. Anthropic's own models overview footnote makes the squeeze visible in a different way: the 1M context window on 4.7 holds roughly 555,000 English words, the same 1M on 4.6 holds about 750,000.

Now compound that with xhigh, which Anthropic explicitly describes as producing "meaningfully higher token usage than high." A Claude Code session that ran on 4.6 at high for an hour will run the same work on 4.7 at xhigh for considerably more tokens. Same sticker price, different effective cost. Max-plan subscribers whose limits are usage-hour windows will hit those windows sooner.

The second reason is output. On the codebase audit that made me notice, xhigh produced a longer report with more self-questioning and more hedging between its own conclusions. high surfaced the same substantive findings with less ceremony. For a complex feature branch where the model had to discover conventions it had not seen before, xhigh was worth it. On a routine audit of a codebase it already knew, it was overkill. The steel-man version of this, and it is a legitimate one: the 4.7 capability gains hold up, but on cost-sensitive daily work the tokenizer penalty at xhigh default can cancel them out. Reverting to high is the move that captures the upgrade without the bill.

When xhigh earns its token premium

The sessions where I keep xhigh on look alike. They are complex feature work in a codebase whose conventions the model has to discover. Multi-file refactors across subsystems that talk to each other. Agentic pipelines where a subagent is going to run a tool loop unsupervised for a while and the cost of a wrong first step is worse than the cost of extra thinking up front.

The shape that justifies it is specific. It is not "this problem is hard." Most of the hard problems I work on resolve fine at high. It is "this problem is hard, and the model has to discover context it does not have before it can act, and a wrong first edit will compound." The AI Persona Profiler pipeline is a concrete example: ten-plus coordinated Opus instances, adversarial dual-analysis, twelve-point voice rubrics, reconciliation across agents. That kind of pipeline is where xhigh earns its premium. If you are not running that shape of work, high is almost certainly enough. At xhigh, Claude Code draws on extended thinking to work through those cross-module dependencies before touching any file.

Why high on 4.7 is the daily workhorse

The 4.7 upgrade over 4.6 shows up at every effort level, and high is where it pays off without the runaway bill.

The four things I notice most, in rough order of frequency. First, the reasoning feels sharper on the first pass. Fewer rounds of "you missed this, try again." Second, the tone is more direct. The model pushes back when it disagrees, asks a clarifying question instead of guessing, and drops most of the validation-forward phrasing 4.6 leaned on. Anthropic describes this as "more direct, opinionated tone with less validation-forward phrasing..." (the full line adds "and fewer emoji than Claude Opus 4.6's warmer style"). Third, file-system memory improved. A scratchpad or notes file across a multi-session workflow carries forward now, without me re-priming context. Fourth, and this is the one that reshaped how I write instructions, 4.7 reads them more literally. The "helpful inference" 4.6 gave you for free is gone.

All four are free at high. You do not need xhigh to get any of them.

Setup hygiene: why well-structured setups win on 4.7

The more-literal instruction-following is a gift for practitioners who already write explicit prompts, and a small tax on practitioners who do not. I had not rewritten a single CLAUDE.md, skill definition, or subagent prompt since the upgrade. Not because I was lazy, but because my existing setup was already built on Claude Code best practices: official documentation referenced at decision points, context-heavy well-structured prompts, no implicit "figure out what I meant" directives. The upgrade felt like a win rather than a tax. Two days after publishing this post I built a free Claude Code skill that audits the same five layers (CLAUDE.md, AGENTS.md, settings.json, hooks, and SDK call sites) for the 4.6 idioms 4.7 mis-reads, and I ran it against the same setup I called "already built on best practices" above. It returned three Critical findings I didn't know I had. Practitioners with looser instructions, who relied on 4.6 to fill in blanks, are likely to feel more friction on 4.7. See the custom-skills post and the plan-audit-implement-verify starter guide for the patterns that pay off here. If that infrastructure needs a refresh on your project, Claude Code Infrastructure Setup is scoped as a standalone engagement.

One regression worth naming explicitly. The Register reported more than thirty GitHub issues filed in April 2026 about false-positive refusals on Opus 4.7, flagged to new real-time cybersecurity safeguards that Anthropic documents as a deliberate 4.7 addition. Legitimate security work has been blocked in some cases. If your Claude Code work sits near that surface (penetration testing, vulnerability research, some forms of security research), know the safeguard exists and that Anthropic's mitigation path is the Cyber Verification Program. Narrow blast radius, documented remediation, not a general strike against 4.7, but worth knowing before it bites you mid-session.

A decision rule you can keep in your head

Run Opus 4.7 at high by default. Reach for xhigh when the session is complex feature work in a codebase whose conventions the model has to discover. Let medium handle well-scoped work where you want to save tokens, and respect that 4.7 will take "medium" literally. Leave max alone until you find a problem it uniquely solves. If you do, I would like to hear about it.

That is the rule. The upgrade to 4.7 is worth it. The default effort level is the one thing I would change before you run your next session.

If you are working through how your team's Claude Code setup maps to 4.7's behavioral changes, my Claude Code engagements are the fit. Bring the CLAUDE.md you are already running, and I will take it apart with you.

Glossary terms used

Extended thinking

Opus 4.7 vs 4.6 in Claude Code: Where the Upgrade Shows Up, and When xhigh Is Worth the Tokens

What changed in Opus 4.7 for Claude Code users

The five Claude Code effort levels, explained

Why I reverted from xhigh to high

When xhigh earns its token premium

Why high on 4.7 is the daily workhorse

Setup hygiene: why well-structured setups win on 4.7

A decision rule you can keep in your head

Agentic AI Governance in Production: Who Owns the Bar When the Agent Ships

Running Claude Code as a Production Engineering Practice

Continue reading: more in Lead with Claude

Claude Fable 5's Silent Degradation: The Safety Tier You Couldn't See, Log, or Turn Off

Claude Fable 5 Is 'Mostly Drop-In.' The Word Doing the Work Is 'Mostly.'

Opus 4.8 vs 4.7, One Week Later: The Upgrade Call I Couldn't Make on Day One

Sources

What changed in Opus 4.7 for Claude Code users

The five Claude Code effort levels, explained

Why I reverted from xhigh to high

When xhigh earns its token premium

Why high on 4.7 is the daily workhorse

Setup hygiene: why well-structured setups win on 4.7

A decision rule you can keep in your head

Reference guides for this topic

Agentic AI Governance in Production: Who Owns the Bar When the Agent Ships

Running Claude Code as a Production Engineering Practice

Continue reading: more in Lead with Claude→

Claude Fable 5's Silent Degradation: The Safety Tier You Couldn't See, Log, or Turn Off

Claude Fable 5 Is 'Mostly Drop-In.' The Word Doing the Work Is 'Mostly.'

Opus 4.8 vs 4.7, One Week Later: The Upgrade Call I Couldn't Make on Day One

Sources

Continue reading: more in Lead with Claude