A Claude capability that allocates the model a token budget for an internal reasoning phase before its final answer, trading speed and cost for deeper decomposition of complex tasks.

How it works

The capability gives the model a separate token budget to reason internally before it commits to a final answer; those thinking tokens count against billing but are not returned to the caller. A larger budget buys deeper decomposition of a hard problem and a smaller one returns faster, so the knob trades latency and cost against reasoning depth.

Why it matters

The budget is a soft cap, not a quality floor: spending more does not guarantee a proportionally better answer, and for tasks the model handles correctly on reflex, the extra latency and cost buy nothing measurable. Tuning is empirical with no model-card formula, so knowing when to spend the thinking budget is its own skill that competes with prompt engineering for the workflow owner's attention.

In practice

A routine edit runs with a small thinking budget for a fast answer; a multi-file refactor is given a larger budget so the model can work through the dependencies before responding. The caller never sees the deliberation tokens but pays for them on both calls.

Related standards and prior art

Defined by Ready Solutions AI