A Claude API feature that stores a repeated context prefix server-side so later requests reuse it at reduced cost and latency, lowering the price of repetitive agentic workflows.
How it works
When a request marks a prefix as cacheable, the API stores it server-side and reuses it on later requests that send the same prefix byte for byte. The cache key is the prefix shape: a single whitespace change, a reordered instruction, or an added comment invalidates the prefix entirely rather than partially, and the next call pays full freight on the whole head again.
Why it matters
Agentic workflows resend a large, mostly fixed context on every turn, so caching the repeated prefix removes most of the per-call cost. The trade-offs are immutability-shaped: the default TTL expires fast enough that a paused session pays full freight on resume, near-identical prompt variants each get their own cache key so each shape pays its own cache-miss cost, and refactoring instructions is expensive because prompt stability and prompt iteration now compete.
In practice
A workflow that prepends the same instructions and reference material to every call benefits from caching exactly as long as the prefix stays byte-identical; edit one instruction mid-session and the next turn pays full freight on the entire reference block again.
Related standards and prior art
- Anthropic: prompt caching ยท continuously updated caches a prompt prefix for reuse on later requests
Defined by Ready Solutions AI