When is MCP the right shape, and when is it overhead?

MCP is the right shape when the capability you want sits inside the protocol's vocabulary of tools, resources, and prompt templates, and when at least two compliant clients (now or later) will benefit from the inversion. It is overhead without payoff when the capability needs a primitive the protocol does not define, when only one caller will ever talk to the server, or when the operational cost of carrying tool definitions in every request exceeds the portability win.

What does the spec actually guarantee, and what does it leave to you?

The spec defines primitives (tools, resources, prompts, sampling, elicitation), two transports (stdio and Streamable HTTP), version negotiation via a dated protocol version, and an OAuth 2.1-based authorization model for HTTP transports. It explicitly does not enforce security at the protocol level, does not define which OAuth scopes a server should expose, and does not standardize multi-tenancy or observability. Governance, scope design, audit logging, and rate limiting are the deployer's responsibility.

How does a production MCP deployment go wrong?

Four named ways. The protocol cannot prevent a server from misrepresenting its tool annotations, so an untrusted server can lie about side effects. Tool descriptions accumulate as context bloat once a client connects to many servers. The stdio transport degrades under concurrent load it was not designed for. Upstream API deprecations propagate through any server that wraps them, so a server that ran cleanly all year can break on a vendor's calendar rather than yours.

Cornerstone Guide

MCP Servers in Production: Ownership Inversion, the Protocol Vocabulary Limit, and What the Spec Leaves to You

Q: What is an MCP server?

A program that exposes tools, resources, and prompt templates to Claude Code and other Model Context Protocol clients over a standardized protocol, so an agent can read from and write to external systems without bespoke per-integration code. The integration to the external system lives in the server, not in the agent: that is the ownership inversion the pattern is built around.

An MCP server moves the integration to a system into the server and out of the agent. The inversion pays inside a finite protocol vocabulary; outside it, the protocol is overhead. The spec also draws a second boundary worth naming, where functional integration moves but governance does not.

Last reviewed May 25, 2026

MCP server Tool use Claude Code hook Production agentic delivery AI authoring trust chain MCP authorization scope

What is an MCP server, in one paragraph?

An MCP server is a program that advertises a fixed set of primitives (tools the agent can call, resources it can read, and prompt templates it can reuse) to any client that speaks the Model Context Protocol. Claude Code and Claude.ai are both compliant clients, and so is a growing list of competing agents. The integration to the underlying system (a Jira tenant, a GitHub repository, a Postgres database, a customer's S3 bucket) lives inside the server. The agent learns about the capability at the protocol layer rather than absorbing the integration code, which means a capability authored once is reachable by every compliant client without further per-client work. That is the ownership inversion the pattern is built around, and it is what I want to start with rather than the wiring detail. If you want the wiring detail, the post on connecting Claude to Jira, GitHub, and Confluence covers the setup end of the work.

The setup post is one half of the story. This page is the other half: what the inversion buys you, where it stops paying, what the spec hands back to you for governance, and what I have watched fail in production. I have been running MCP integrations against an enterprise Jira deployment since May 2025, tracking the spec's evolution and the connector vendors' deprecation calendars on a quarterly cadence. The lived experience that grounds this cornerstone is the audit discipline that surrounds the deployment; the upcoming Atlassian HTTP+SSE cutover I describe below is the next live test of the framework, and I have built the deployment expecting it to land cleanly.

What does the ownership inversion actually buy you in production?

Three properties, named directly, in the order they pay off.

Portability across agents. A server authored against the MCP spec talks to every compliant client. The same Atlassian Rovo MCP server reachable from Claude Code is reachable from Claude.ai, from Cursor, from VS Code's GitHub Copilot, and from any other agent that speaks the protocol. The integration code is not duplicated per agent; the protocol contract is. If I rewrote the integration logic from one client's tool format to another's every time a teammate switched editors, the work would never finish. Portability is what makes the work finish.

Swappability under upstream API change. This is the property the cornerstone leans on hardest, and the property the live deployment is built around. Atlassian announced the deprecation of the legacy HTTP+SSE endpoint in favor of Streamable HTTP for its Remote MCP server, with the legacy endpoint scheduled to stop accepting connections on June 30, 2026. The team I support runs the out-of-the-box Atlassian connector, and my expectation, based on how the protocol's primitive layer works, is that the cutover will require zero operator action on my side: the server changes its transport behind the same advertised primitives, and the agent on the other side keeps calling the same tools through the same protocol contract. The integration logic moves without the agent or its prompts having to move. The expectation is testable; the cutover is the test; the audit cadence is what makes the result observable rather than incidental.

Reviewability as a single unit. A capability that lives inside an MCP server can be reviewed, version-pinned, and revoked as one unit. A capability that lives tangled inside an agent's prompts and tool wrappers is reviewed nowhere except the diff log. I treat a new MCP server the same way I treat a new package dependency: I read what it exposes, check what scopes it requests, decide what the blast radius is, and decide who owns the upgrade calendar. The audit playbook is the operational form of that practice, and the discipline maps onto the decision tree for where rules belong in Claude Code: permissions on what a server's tools can do are deterministic boundaries, and deterministic boundaries belong in configuration that fails closed, not in prose the model may or may not honor.

A useful frame from one of Anthropic's own production posts on agent design is that an MCP server is not a single capability but an inventory of capabilities, and that inventory has its own ergonomics. The Anthropic engineering essay on code execution with MCP frames the same dynamic at scale: an agent connected to many MCP servers has to load and reason about tool definitions before it does any real work, and that loading cost is itself a design constraint. The inventory framing is not a contradiction of the ownership inversion; it is the operational shape the inversion takes once a real team is running more than one server.

What does the spec define, and what does it leave to you?

I would rather state the boundary than gloss it. The current stable spec is dated 2025-11-25, the same revision that introduced experimental Tasks and tightened authorization. A future revision (the 2026-07-28 release candidate, currently in public review on the MCP blog) introduces a stateless protocol core and a formal twelve-month deprecation policy, but those changes are not in the stable surface yet.

What the spec defines:

Three server primitives. Tools (functions the agent can call), resources (data the agent can read), and prompts (templates the agent can reuse). A given server advertises some subset of all three.
Two transports. The stdio transport runs the server as a child process of the client and exchanges JSON-RPC over its stdin and stdout; this is the smallest blast radius and the right shape for a locally installed integration. The Streamable HTTP transport uses HTTP POST for client-to-server messages with optional Server-Sent Events for streaming server-to-client responses; this is the shape for remote or hosted servers behind an auth boundary. The older HTTP+SSE transport is deprecated and exists only for backwards compatibility.
Version negotiation. The client sends the protocol version it supports in the initialize request. The server must echo the same version if it supports it; otherwise it offers its own. Both parties agree to use only capabilities that were successfully negotiated.
An OAuth 2.1-based authorization model for HTTP transports. The authorization specification requires servers to implement RFC 9728 Protected Resource Metadata, requires clients to implement PKCE and Resource Indicators (RFC 8707) on token requests, and requires servers to validate that incoming tokens were issued specifically for them as the intended audience. The model is OAuth 2.1 with the modern hardening profile, not a custom scheme.

What the spec explicitly does not define:

Tool-annotation enforcement. Annotations are hints. A server can label a tool as read-only, and a malicious server can label any tool as read-only, and the protocol does not catch the lie. The maintainers of the spec are direct about this in a recent post on what those annotations can and cannot carry:

If you need a guarantee that a tool can't exfiltrate data, that's a job for network controls or sandboxing, not a boolean hint. MCP maintainers, "Tool Annotations as Risk Vocabulary"

OAuth scope shape. The spec lays down OAuth 2.1 mechanics but does not say what scopes a server should expose. Every server author makes that decision; every deployer must read and audit it. The GitHub MCP server's configuration documentation shows what a thoughtful scope design looks like (toolsets that can be enabled or disabled, a read-only mode that takes precedence over any other configuration); a less thoughtful server can put the same write capability in a default scope a user never reviewed.
Multi-tenancy. The spec does not standardize how a single server serves multiple tenants. Production teams handle this by isolating servers per tenant or per scope. The Cloudflare reference architecture ships a general Cloudflare API MCP server alongside sixteen product-specific MCP servers, so a client connecting for observability uses a different endpoint and OAuth scope than a client connecting for audit logs or AI Gateway. That partitioning across servers is the deployer's architectural choice, not a protocol guarantee.
Observability of stdio servers. A stdio server is a child process. There is no HTTP endpoint to instrument, no metrics surface to scrape. If the server is silent, the agent that called it is the witness of record, and the agent's witness is unreliable by design.
Rate limiting and cost attribution across multiple clients. The spec does not say what happens when ten agents share one server; the deployer decides.

The split worth naming is between functional integration and operational governance. Functional integration genuinely moves into the server: that is the inversion. Operational governance (consent flows, scope design, audit logging, rate limiting, multi-tenancy, deprecation watch) explicitly stays with whoever runs the deployment. The spec is honest about that boundary; the cornerstone is dishonest if it implies otherwise.

Where does the protocol vocabulary stop paying?

Five named cases where I would not reach for MCP.

The capability needs a primitive the protocol does not define. Streaming a partial response that the agent should consume mid-flight, exchanging a binary file the agent should treat as a blob rather than a resource, holding session state that crosses many separate tool calls in a way that fits awkwardly onto the protocol's per-call request/response shape: in each of those cases, the protocol is in the way rather than helping. The 2026-07-28 release candidate moves toward stateless servers and an extensions framework precisely because the current spec's stateful shape was friction; until that revision lands, the friction is the operational reality.

Only one caller will ever talk to the server. Portability is the load-bearing benefit. If the server will be reached by exactly one agent that I also own, with no plausible future where a second client wants the same capability, the inversion has nothing to invert. Direct tool use from inside the agent's prompt context can do the same work without the JSON-RPC layer; the Claude API tool-use surface is the right primitive in that case.

The operational cost of carrying tool definitions per request exceeds the portability win. Every tool a server advertises ships its description into the agent's context on every call. A GitHub MCP server with well over a hundred tools spread across nearly twenty toolsets (as of June 2026, per the github-mcp-server README) carries tens of thousands of tokens of tool definitions before the agent has done any real work. Connect three or four such servers and the metadata alone displaces real reasoning room. The Anthropic code-execution-with-MCP essay frames the same dynamic at scale: an agent connected to many servers can pay a large multiple of its useful context in tool definitions before producing useful work. Simon Willison has written about stepping away from MCP for coding agents in favor of CLI utilities and direct libraries; his stated reasons are that the tool descriptions consume valuable real estate in the agent's context and that chaining tool calls through the context wastes tokens. The author's reading — that this overhead is doing work a CLI utility or library could do in-process, and that the standardization which makes MCP composable is also what makes that overhead a constant tax rather than something an individual server can engineer away — is an extension of Willison's argument, not his own framing.

The reasoning that uses the capability is tightly coupled across systems. If a single decision needs to read from a database, transform with a custom function, and write to a queue in a way where each step's output materially changes the next step's input, putting each step behind a separate MCP server adds round-trips at every transition. The protocol's strength (clean capability boundaries between servers) becomes a liability when the reasoning needs to cross those boundaries dozens of times in one decision. That class of work belongs inside a code-execution surface or a single agentic loop, not in a fan of MCP calls.

The team owning the deployment is not ready to carry what the spec hands back. This is the soft case and the most important one. An MCP server is a maintained dependency on someone else's evolving API. If the team that wires it up does not own the audit cadence, does not have a deprecation-watch routine, does not have someone whose job it is to review the scope changes the vendor ships, then the server's failure mode is a calendar nobody is reading. The work I describe in the engineering manager's guide to governing agentic development is the leadership layer that makes MCP carry its weight; without that layer, the cleanest possible MCP wiring is still a future incident waiting on someone else's deprecation notice.

The boundary is not "MCP is bad" anywhere in this list. The boundary is that the inversion has conditions of payment, and a production team gets value from the inversion only when those conditions hold.

What does a production lifecycle look like?

A production MCP server is a long-lived dependency. Its lifecycle has four shapes worth tracking explicitly, because each is a place teams get surprised.

Transport evolution. Transports are not stable. The HTTP+SSE transport defined in the 2024-11-05 spec is now deprecated; Streamable HTTP is the spec-preferred shape, and live deployments have moved one by one. Atlassian's June 30, 2026 cutover is the most visible upcoming example, and the language in the deprecation notice is plain:

HTTP+SSE remains only for backward compatibility and won't see future improvements. Atlassian, "HTTP+SSE Deprecation Notice for Atlassian Rovo MCP Server"

A team that picked up the out-of-the-box connector pays this cutover with zero operator action because the integration code lives in the server; the connector's transport changes behind the same advertised primitives. A team that built a custom client against the legacy transport pays it with a migration. Knowing which side of that line your deployment sits on is a one-time question with a long-lived answer, and the cutover-window is when it gets answered for free.

Capability versioning. The protocol version is a date string, and capabilities are negotiated explicitly during the initialize handshake. Because capabilities are negotiated rather than inferred from the version, a server can grow new optional capabilities without breaking older clients. The flip side is that clients can no longer assume a server speaks the latest revision just because the protocol version on its endpoint looks current. The capability table negotiated at initialize is the source of truth; treat it as a contract, log it on connect, and audit it when the server-side team ships a release.

Authorization posture. For HTTP-based servers the spec requires OAuth 2.1 with PKCE and Resource Indicators. The discipline in production is that the server's scope catalog is reviewed before the server is wired, that the client never holds a token broader than the smallest scope its agent needs, and that tokens are audience-bound to the specific server. The GitHub MCP server's read-only mode is a good shape of the discipline: read-only is a strict filter that disables write tools regardless of other configuration, so a misconfigured agent cannot accidentally write where the scope said it could only read. Cloudflare's own remote MCP fleet ships a stronger version of the same discipline at the architecture level, by running separate servers per capability so the OAuth scope a client requests is naturally bounded to one capability's reach rather than the union of every capability the vendor offers. That is not a recipe; it is a shape. A team can adopt the same shape on their own infrastructure regardless of where the server is hosted.

Audit cadence. Quarterly is the cadence I run. Each quarter, I read the current tool catalog (which tools, which scopes, which capabilities marked write), compare it to the catalog last quarter, check the vendor's deprecation calendar for anything within a six-month window, and confirm the team's token store hasn't drifted out of rotation. The cadence is the same shape as the agentic development plan-audit-implement-verify loop, just applied to the integration layer rather than the application layer. The point of the cadence is not to find problems on every pass; it is to make the deprecation calendar visible before the deprecation hits.

What goes wrong in production, and how do teams catch it?

Four named failure modes. Each has a structural mitigation; none has an editorial mitigation.

Tool-annotation lies (and broader untrusted-server risk). A server can advertise that a tool is read-only when it is not, or that a tool is safe when it has side effects the annotation does not mention. The spec acknowledges the gap directly and points to the cure: out-of-band controls. Network policy, sandboxing, allowlists on tool calls inside the client, and per-tool human-in-the-loop gates are the layer that catches this. The Claude Code hook surface is where I put the deterministic gates: a PreToolUse hook on a sensitive scope is what turns a "the model promised to be careful" into "the runtime checked before letting the call through". The hook gate is a partial answer, not a complete one: it disciplines the call site, but it does not eliminate the broader composability risk that Simon Willison names as the lethal trifecta, where an attacker-controlled server can poison the agent's context and influence a subsequent call to a trusted server through the shared reasoning surface. The structural fix for that failure mode is not at the call gate but at the trust boundary itself: do not connect untrusted MCP servers to an agent that also has access to private data and a network exfiltration path, and treat the question of which servers may share context as a first-class design decision rather than a default.

Context budget collapse under multi-server load. Tool descriptions accumulate. A client connected to five MCP servers carries every tool's description on every call. At small server counts the cost is invisible; past a threshold it dominates. The structural cures are to disable the toolsets a workflow does not need (the GitHub MCP server's per-toolset configuration is built for this), to load tool definitions lazily where the runtime supports it, and to be honest about the operational cost when the workflow is connected to enough servers to feel it.

Stdio transport under load. The stdio transport is JSON-RPC over a child process. A load test published by Stacklok on August 19, 2025 (using their open-source yardstick MCP echo server on Kubernetes) found that at twenty concurrent connections, only two of fifty expected requests succeeded; twenty-two were actually issued, and twenty of those failed before the test window closed. The failure mode is structural rather than a bug: stdio was specified for locally installed servers with a single client process, and concurrent multi-client load is not the shape it was specified for. Production deployments that need concurrent load belong on Streamable HTTP.

Upstream API deprecation cascade. Every MCP server wraps somebody else's API. When that API breaks, the wrapper breaks, and the agents that called the wrapper break with it. The cascade is what makes the audit cadence load-bearing. The audit playbook is the operational version of the cure: a schedule that surfaces the deprecation on your calendar rather than the vendor's. Without the cadence, the failure mode is "a Tuesday morning where the server stops working and nobody owns the migration".

There is a fifth class of failure mode worth naming separately because it is structurally different: supply-chain compromise of the MCP layer itself. The OX Security advisory from April 2026 documents the pattern in its specific form: a command-injection root cause in Anthropic's MCP SDK that lets a malicious server author smuggle arbitrary shell commands through the STDIO transport's configuration surface, hitting registry-published packages that wire MCP into common IDE and agent contexts (CVE-2026-30615 Windsurf RCE and CVE-2026-30623 LiteLLM command injection are the named examples). The mitigation is the same shape as for any other software supply chain: pin versions, prefer first-party servers when they exist, audit STDIO configuration surfaces for command-injection paths, and treat MCP servers as a privileged dependency class rather than a regular library. What MCP changes is the blast radius: a vulnerable server with broad scopes is closer to "vulnerable system service" than "vulnerable library".

When is MCP the wrong shape entirely?

This is the trade-off matrix. The hardest cell is the one labeled "single-caller, code-execution-shaped work", and the diagnostic for it is whether the work would be the same whether or not the protocol existed. If you would build it the same way without MCP, MCP is not earning its overhead.

Situation	Right shape
Two or more compliant clients (now or planned) need the same external capability	MCP server
Capability fits inside tools, resources, or prompt templates as primitives	MCP server
Server runs remote behind an auth boundary, with multiple clients	Streamable HTTP MCP server
Server runs locally as part of a single agent's session	stdio MCP server
Capability needs streaming partial response the agent consumes mid-flight	Not MCP; direct API or tool use inside the agent loop
One caller only, no portability case	Direct API, or tool use in the agent's context
Reasoning crosses systems many times per decision	Code-execution surface or in-process integration; not a fan of MCP calls
Requirement is "this must happen every time" rather than "this should usually happen"	Hook on the deterministic path; MCP is the surface, the hook is the gate
Capability is sensitive enough that you cannot accept tool-annotation hints	MCP behind a hook gate, or not at all
Team has no audit cadence or deprecation-watch capacity	Defer the MCP wiring until the operational discipline exists

A useful single-sentence form of the matrix: MCP is the right shape when at least two clients benefit, when the capability fits the protocol vocabulary, and when the team owns the audit cadence the vendor's deprecation calendar will eventually exercise. Outside any of those three conditions, the cleaner answer is usually somewhere else in the stack.

The decision is also adjacent to the wider question of where in Claude Code a given rule belongs. An MCP server is a tool surface; it is not a place to enforce policy. Policy belongs in hooks and settings, where the runtime decides whether a call goes through at all. Pinning that boundary in your team's mind early saves a great deal of debate later when someone proposes that a server "just enforce" something the protocol cannot enforce.

How do MCP and the other Claude Code primitives compose?

The honest answer is "as four cooperating layers", and any treatment that picks one layer in isolation misses how production teams use them in practice. The skills cornerstone frames the same decomposition from the skills side; the part of the decomposition that this page owns is the external-system layer.

Skills carry procedural knowledge the model loads on demand. Hooks carry deterministic gates the runtime enforces every time. Subagents carry isolation per worker for work the orchestrator decomposes across lanes. MCP servers carry the external-system access the other three cannot legally provide. The composition rule I use: if the requirement is "the model needs access to a system it cannot see, with per-user authorization", MCP is the right primitive; if the requirement is "this must happen every time the agent touches the integration", that is a hook on top of MCP, not a different replacement for it. The four layers cooperate; they do not substitute for each other.

A useful pattern I run on this site's own infrastructure: an MCP server defines what the agent can reach, a settings.json scope describes which tools of that server are allowlisted, a PreToolUse hook enforces the deterministic policy on the calls that do go through, and a skill encodes the procedural knowledge for when and why the agent should reach for the integration in the first place. Each layer catches a different failure class; none replaces the others.

How is this page kept current?

This cornerstone has the role: cornerstone posture rather than role: freshness-reference, so the build does not hard-fail it on a 30-day window. The cadence is editorial. I revisit the page when the MCP spec ships a revision (next anticipated at the 2026-07-28 RC's eventual GA, with the stateless-core and formal-deprecation-policy changes that page implies), when a major vendor announces a transport or auth posture change, or when a new named failure mode surfaces that the trade-off matrix should engage.

The cornerstone is the deep companion to the MCP server glossary entry, with tool use, Claude Code hooks, production agentic delivery, and the AI authoring trust chain as the related vocabulary. The internal evidence anchor is the proof-bank row for the Enterprise-org Claude Code + MCP + Jira deployment that the audit cadence has carried through the protocol's first major spec revision and is positioned to carry through the Atlassian HTTP+SSE cutover; the external anchors are listed under Sources below with their publication dates.

When I scope a consultation, workshop, or implementation engagement around MCP, the work usually starts with the trade-off matrix above rather than with the wiring. Wiring is the cheap half. Deciding what belongs behind the protocol and what does not is the half that keeps paying.