Model effort matrix

A model effort matrix is the joint decision of which model tier to run a task on and how much reasoning effort to spend on it, treated as two coupled axes of one routing choice rather than a single model selection.

How it works

Two independent controls set what a task costs and how well it is handled. The first is the model tier: a more capable model is slower and more expensive, a faster model is cheaper and less capable, and the right tier depends on how demanding the task is. The second is effort, a control that varies how many tokens the model spends within a single model, trading thoroughness for latency and cost without changing which model runs. Because the two are independent, the same task can be served by a strong model at low effort or a lighter model at high effort, and those are different points on the cost-and-capability surface, not the same point reached two ways. The routing decision is choosing the point that matches the task: enough capability and enough effort to clear the bar, and no more than that.

Why it matters

Treating model choice as the lever that matters leads teams to reach for the most capable model by default and pay for capability the task did not need. Effort is frequently the better first move, since raising effort on a mid-tier model often clears a task that seemed to require a larger one, at a fraction of the jump in cost. The trade-off is that the two axes interact and the interaction is not linear: maximum effort on a small model is not the same as a large model at rest, and past a point more effort buys overthinking rather than accuracy, so the matrix is a thing to tune against real tasks rather than a formula to apply. The decision is also workload-shaped, since a high-volume, latency-sensitive path and a rare, accuracy-critical one sit at opposite corners of the matrix even within one system, which is why a single global setting tends to be wrong for one of them.

In practice

A pipeline has two kinds of work: a high-volume classification step that runs constantly, and an occasional synthesis step where a wrong answer is expensive. Routing both to the most capable model at full effort is simple and overpays badly on the classification step; routing both to the cheapest model at low effort is also simple and underperforms on the synthesis. The matrix decision puts the classification step on a fast model at low effort and the synthesis step on a capable model at high effort, matching each to the corner it belongs in rather than picking one setting for both.

Practical considerations

The two axes are tuned in a sensible order: settle the lowest effort that holds quality on a given model first, then decide whether the model tier needs to change at all, because the effort move is cheaper to test and to reverse. The effort axis is not available on every model, and the span of effort levels can differ between them, so the cheapest, fastest tier may expose no effort control at all and force the decision back onto model choice. Effort affects more than reasoning depth; it changes how many tokens the model spends across the whole response, including how many tool calls it makes, so a lower-effort agent is terser and more direct and a higher one is more exhaustive. A long-horizon agentic task usually wants both a capable tier and a high effort level, since it has to sustain coherence across many steps, while a short scoped lookup is the canonical place to drop both. The matrix is not static, since a model release shifts the whole surface and a tier that was right last quarter can be over- or under-powered now, so the routing is worth re-checking when the underlying models change. The within-model reasoning-budget mechanics that sit under the effort axis are treated in depth by the extended thinking term.

Related standards and prior art

Anthropic: choosing a model · continuously updated presents a model selection matrix across capability tiers and names the effort parameter that trades intelligence for latency and cost within one model
Anthropic: effort parameter · continuously updated documents the effort axis as a ladder of levels trading capability for latency and cost within a single model

Defined by Ready Solutions AI

How it works

Why it matters

In practice

Practical considerations

Related standards and prior art

Related terms

Appears in