Skip to main content

Command Palette

Search for a command to run...

MCP explained: why your agent should write code, not make tool calls

How Model Context Protocol works, why one-tool-per-endpoint fails at scale, and how to build a production-ready MCP server for SaaS.

Updated
9 min read
MCP explained: why your agent should write code, not make tool calls

MCP (Model Context Protocol) has 97 million monthly SDK downloads, 9,400+ public servers, and support from every major AI vendor. 78% of enterprise AI teams have at least one MCP-backed agent in production. If you're building an MCP server to connect your SaaS to AI agents, Model Context Protocol is the standard.

But most MCP servers are built wrong. They expose one tool per API endpoint, dump thousands of tool definitions into the agent's context, and hope the model picks the right ones. There's a better pattern that's been hiding in plain sight: let the agent write code.

what is MCP (Model Context Protocol)

Model Context Protocol is an open standard, originally created by Anthropic, now governed by the Agentic AI Foundation under the Linux Foundation. It standardizes how AI agents discover and use external tools. Before MCP, every agent framework had its own way of registering tools, describing schemas, and handling invocations. MCP replaces all of that with one protocol that works across clients.

The architecture is straightforward. An MCP server exposes tools (functions agents can call), resources (read-only data), and prompts (reusable templates). An MCP client -- Claude, Cursor, a custom agent -- connects to the server, discovers what's available, and uses it. The wire protocol is JSON-RPC 2.0. Transport is either stdio (local, single-client) or streamable HTTP (remote, multi-client, the one you want for production).

That's it. MCP doesn't make agents smarter. It gives them a standard way to find and call tools. What you put behind those tools is what matters.

why OpenAPI-to-MCP server generators fall short

The most common way to build an MCP server today: take an OpenAPI spec, generate one tool per endpoint, register them all. Tools like Speakeasy and Stainless have evolved past simple one-tool-per-endpoint generation -- Stainless now generates code-mode servers with search_docs and execute tools, and Speakeasy offers scope-based filtering with token optimization. But the default output from most OpenAPI-to-MCP generators still gives you one tool per endpoint.

The problem shows up at scale. The Cloudflare API has over 2,500 endpoints. That's 2,500 tool definitions that need to load into the agent's context window. Anthropic measured this: traditional tool loading for a large API consumes over 150,000 tokens before the agent processes a single user request. For Cloudflare's full surface, it's 1.17 million tokens.

This is context poisoning. The agent spends its working memory reading tool descriptions instead of doing work. Accuracy drops as the tool count rises. MCPMark, the most realistic MCP workflow benchmark, found that even the best model (gpt-5-medium) only completed 52.6% of real CRUD workflows. Most strong models fell below 30%.

There's also a round-trip problem. In the traditional model, every API call is a separate tool invocation. The agent calls tool A, waits for the result, reasons about it, calls tool B, waits again. A 7-step workflow means 7 full neural network forward passes, 7 network round trips, and 7 chances for the agent to lose the thread. Agents take 1.4 to 2.7 times more steps than necessary when working this way.

One tool per endpoint works for small APIs. It falls apart for anything real. We covered the broader implications of this in endpoint-level MCP isn't enough.

MCP code mode: why agents write code instead of calling tools

Here's the thing that doesn't get enough attention: LLMs are good at writing code. Really good. SWE-bench Verified has top models solving over 80% of real software engineering tasks. Claude Code, Cursor, Codex -- the entire AI coding wave exists because models can write correct, working code when given the right context. We wrote about this tension in vibe coding vs. workflow reliability -- models can write code, but unguided code execution without workflow knowledge creates its own problems.

So why are we asking them to pick from a list of 2,500 tools?

Code Mode, a pattern developed by Cloudflare and documented by Anthropic, flips the approach. Instead of exposing every endpoint as a separate tool, you expose three tools: one to search for relevant endpoints, one to inspect their schemas, and one to execute code. The agent writes TypeScript that chains multiple API calls together, and a V8 sandbox runs it.

The numbers: Cloudflare's 2,500+ endpoints go from 1.17 million tokens down to roughly 1,000 tokens. That's a 99.9% reduction. And the footprint stays fixed regardless of API size. Whether the API has 50 endpoints or 5,000, the agent sees the same three tools.

Anthropic measured a similar reduction: 150,000 tokens down to 2,000, a 98.7% decrease for their test case.

But the token savings are almost secondary. The real win is execution efficiency. MCP code mode versus traditional agent tool calling is a fundamentally different execution model. Instead of 7 sequential tool calls with 7 round trips, the agent writes one block of TypeScript that handles all 7 calls internally. Intermediate results stay in the sandbox. The agent only sees the final output. A workflow that used to take 7 reasoning cycles collapses into one.

how MCP code mode works: search, inspect, execute

The pattern, as implemented in production MCP servers, gives the agent three tools:

search_tools -- the agent describes what it needs in natural language. The server searches across all registered API endpoints and returns matching tool names with their TypeScript interfaces. The agent sees a handful of relevant endpoints, not thousands.

tool_info -- the agent asks for the full TypeScript interface of a specific tool. Types, parameters, return values, everything needed to write correct code against it.

call_tool_chain -- the agent writes TypeScript code that calls multiple tools, processes intermediate results, handles branching logic, and returns the final output. The code runs in a V8 isolate with no filesystem access, no network access outside of registered tools, memory limits, and an execution timeout. Fresh isolate per invocation. Nothing persists.

The agent's workflow for a complex task: search for relevant tools, inspect the ones that look right, write TypeScript that orchestrates them, submit it. One round trip for the whole thing.

Compare that to traditional agent tool calling: discover tools (1 call), call tool A (1 call), reason about result, call tool B (1 call), reason again, call tool C... You're looking at 10+ round trips for what code mode handles in one.

Models are measurably great at writing code and measurably bad at navigating massive tool lists while maintaining state across dozens of sequential calls. Code mode leans into the thing they're good at.

how to build an MCP server from your API

The standard path: take your OpenAPI spec, use a generator like FastMCP or the official MCP SDK, create tools for each endpoint, deploy on Cloudflare Workers or your own infra, handle auth yourself. Tutorials from WorkOS and the official docs walk through this. It works for small APIs with a handful of endpoints.

The problems start when you move past the tutorial stage. For a production-ready MCP server, you need to handle OAuth 2.1 authentication (the MCP spec mandates PKCE), manage per-user token storage, implement rate limiting, set up audit logging, handle container isolation if you're multi-tenant, and keep up with the spec as it evolves.

Then there's security. The OWASP MCP Top 10 dropped in 2026, and the numbers should give anyone building their own MCP server pause: 38% of servers have no authentication, and over 30 CVEs were filed against MCP implementations in just the first two months of 2026. We wrote about this problem early in MCP security: the unvetted server problem and again in the real MCP security gap. The security surface of a production MCP server is real, and it's yours to own if you build it yourself.

Building an MCP server is straightforward. Building one that's production-ready, secure, and useful for multi-step workflows is a different project entirely.

the missing piece: workflow knowledge for production-ready MCP servers

Code mode solves the context and execution problem. It doesn't solve the knowledge problem.

An agent with three tools and a 2,500-endpoint API can now efficiently search, discover, and execute. But it still has to figure out which 7 endpoints to chain together for a customer refund, in what order, with what parameters. The API spec tells you what each endpoint does. It doesn't tell you how they work together.

We covered this in depth in AI automation is broken. Here's what actually works. The gap between "can call APIs" and "can automate work" is workflow knowledge -- the understanding of which tools to chain, what dependencies exist between them, and what to do when things fail.

This is where search quality matters more than people realize. When the agent calls search_tools with "process a customer refund," what comes back determines whether the workflow succeeds or fails. A naive keyword search returns every endpoint with "refund" in the description. A search backed by actual workflow knowledge returns the specific sequence of endpoints, in order, with context about parameter dependencies and failure modes.

That's the difference between a tool search and a workflow search. Endpoint-level search isn't enough for the same reason endpoint-level MCP isn't enough.

building a production-ready MCP server for SaaS

MCP is a protocol. A good one. But a protocol is plumbing. It moves data between agents and tools. It doesn't make the tools intelligent.

Making MCP servers intelligent means adding a knowledge layer behind them. At Hintas, we ingest a company's OpenAPI specs and documentation, enrich the specs with missing descriptions, chunk and enrich documents with context that makes them findable, and build a search layer that returns workflow-level results. When an agent searches for "process a refund," it gets back the ordered sequence of tools with parameter dependencies resolved, not a flat list of keyword matches.

The MCP server is the delivery mechanism. The knowledge layer is the product. A well-formatted toolbox is still a toolbox. Automation infrastructure that agents can actually use to complete real tasks requires workflow knowledge behind the protocol.

how Hintas compares to other approaches

DIY (SDK + OpenAPI)SpeakeasyStainlessHintas
OpenAPI to MCP serverManualAutomatedAutomatedAutomated
Code mode supportBuild yourselfPartialYesYes
Workflow knowledge layerNoNoNoYes
Managed hosting + SSLNoYes (Gram)PartialYes
OAuth 2.1 built-inBuild yourselfYesYesYes
Security hardeningYour responsibilityBasicBasicManaged (OWASP-aware)
Multi-step workflow searchNoNoNoYes (three-signal hybrid + RRF)
Learns from executionNoNoNoYes

MCP's 2026 roadmap is focused on making the protocol production-ready: better auth, audit trails, gateway support, transport scalability. All necessary. But protocol improvements don't fix the workflow knowledge gap. An agent with perfect MCP transport and no workflow knowledge still fails at multi-step tasks.

MCP handles agent-to-tool communication. The complementary A2A protocol handles agent-to-agent coordination. Hintas is built to support both -- the knowledge layer is protocol-agnostic, even though MCP is the primary delivery mechanism today.

The protocol is solved. The knowledge layer is the hard part. And it's what separates MCP servers that look good in demos from MCP servers that work in production.


If you're building a SaaS product that needs to connect to AI agents, Hintas extracts your workflow knowledge from existing API specs and docs, validates it, and deploys it as a production-ready MCP server with built-in auth, security, and workflow intelligence. Request early access.