AI automation is broken. Here's what actually works.

Gartner says 40% of enterprise apps will have task-specific AI agents by end of 2026, up from under 5% in 2025. Deloitte reports that 75% of companies will be investing in agentic automation this year. The money is real.

And most of it is going to fail.

Not because the models are bad. Because the infrastructure is wrong.

why AI automation fails at multi-step tasks

The default approach to AI automation in 2026 looks like this: take your API, wrap each endpoint as a tool, hand the toolbox to an agent, and hope it figures out the rest.

It doesn't.

MCPMark, the most demanding MCP benchmark to date, tested agents on realistic CRUD workflows across real tools. The best model, gpt-5-medium, managed 52.6% pass rate. Claude Sonnet 4 and o3 both fell below 30%. On average, agents needed 17.4 tool calls per task across 16.2 execution turns. That's a lot of round trips to still fail half the time.

Individual tasks have gotten better. OSWorld scores have climbed from 12% to the 70-80% range for top models on single-application GUI tasks. But those gains come from better orchestration frameworks layered on top, and they don't transfer to multi-tool API workflows. MCPMark tests exactly that scenario -- real CRUD operations across real MCP servers -- and the best model still fails half the time. A real-world freelance task benchmark found agents completing under 3% of paid tasks. Manus managed 2.5%. GPT-5 hit 1.7%.

The pattern is consistent: agents handle isolated operations fine. Chain seven of them together with dependencies, and everything falls apart.

the knowledge gap, not the intelligence gap

The instinct is to blame the model. Upgrade to a bigger one, add more context, try a different prompt. But the failure pattern across these benchmarks points somewhere else: agents don't fail because they can't reason. They fail because nobody told them how the work actually gets done.

A customer refund isn't one API call. It's seven: verify the order, check return status, calculate the amount, reverse the payment, update inventory, send confirmation, log for compliance. Each step depends on the output of the previous one. Get the order wrong and you refund a non-existent purchase. Fail midway and you've got a reversed charge with no inventory adjustment. This is an AI agent infrastructure problem, not a model problem.

That ordering knowledge, those parameter dependencies, those failure handling paths -- they live in engineers' heads, in Cypress test suites, in Confluence pages that three people maintain. They don't live anywhere an agent can access them.

As we wrote in Why 40% of AI projects fail, the projects that succeed encode workflow knowledge explicitly. The projects that fail hand agents a pile of endpoints and expect them to derive the workflow from schema descriptions. This is the gap between "can call an API" and "can do the job."

why MCP wrappers don't become SaaS automation

MCP adoption is accelerating. 9,400+ public servers. 97 million monthly SDK downloads. 78% of enterprise AI teams have at least one MCP-backed agent in production. Every major AI vendor supports it.

But the default MCP server -- the kind you generate from an OpenAPI spec -- gives you one tool per endpoint. That's 2,500 tools for the Cloudflare API. An agent looking at that list has to figure out which 7 of those 2,500 tools to call, in what order, with what parameters, before it can do anything useful.

This is the same problem the pre-MCP world had, just with better plumbing. You've standardized how agents discover and call tools. You haven't told them which tools matter for a given task, or how those tools relate to each other. As we covered in endpoint-level MCP isn't enough, it's a toolbox without instructions. And toolboxes don't automate anything.

There's also a security problem. The OWASP MCP Top 10 landed in 2026, and the numbers aren't great: 38% of MCP servers have no authentication at all. Rolling your own MCP server means owning that security surface -- auth, token handling, permission boundaries, audit logging -- on top of the workflow problem.

what workflow automation actually requires

The companies that get agentic automation working in production share a pattern. They treat workflow knowledge as an engineering artifact, not something the model will figure out from context. We've seen this across our work building Hintas and in how successful teams measure AI ROI at the workflow level.

Start with extraction. Your OpenAPI specs already encode which endpoints exist. Your E2E test suites already encode the order they're called in. Your runbooks describe what to do when step 4 fails. The knowledge exists -- scattered across formats and systems, but it exists. Nobody needs to write it from scratch.

Then validation. An extracted workflow that hasn't been tested against a staging environment is a hypothesis, not a production path. You need execution against staging, combined with human-in-the-loop review, before anything goes live. As we wrote about in running AI workflows in production, the gap between demo and production is validation.

Then delivery through a standard protocol. MCP is the right choice. Not because it's magic, but because it's the only protocol every major AI client supports. An agent that speaks MCP can connect to validated workflows without custom integration work.

The infrastructure determines whether the model can do anything useful. And right now, for most companies, that infrastructure doesn't exist.

how Hintas builds the automation layer

We built Hintas around a specific bet: the truth of how software works already exists in a company's API specs, test suites, and documentation. You shouldn't have to rebuild that knowledge with an SDK or a drag-and-drop workflow builder. You should extract it from what already exists.

The pipeline: ingest OpenAPI specs and company docs. Enrich specs with missing descriptions using LLM analysis. Chunk and enrich documents with context that makes them searchable -- synthetic questions, contextual prefixes, noise filtering. Build a three-signal search layer (passage vectors, question vectors, keyword search) fused via Reciprocal Rank Fusion that returns workflow-level results, not just individual tool matches. Validate against staging. Deploy as a managed MCP server with built-in OAuth 2.1, rate limiting, and audit logging.

Other platforms in this space -- Composio, Truto, StackOne -- focus on pre-built integrations and connectors. Hintas takes a different approach: extracting workflow knowledge from what your engineering team has already built, rather than replacing it with another integration layer.

The result: agents connect to one MCP server and get access to the company's entire automation surface. Search for what you need, get back the workflow context, execute. The knowledge layer does the hard part.

And it gets better over time. Execution traces feed back into the system. An API that times out under load on Thursday mornings, an undocumented rate limit on the payment endpoint, a parameter format that the docs get wrong -- all of it gets captured and surfaced in future requests. The knowledge base accumulates operational knowledge that makes every subsequent execution more reliable.

making SaaS AI-ready

The next wave of SaaS isn't going to compete on features. It's going to compete on whether agents can use those features to automate real work. We wrote about this shift in AI is the foundation, not a feature -- it's not something you bolt on later.

Deloitte projects that 35% of point-product SaaS tools will be replaced by AI agents or absorbed into larger agent ecosystems by 2030. SaaS pricing is already shifting from seats to usage-based and outcome-based models because agents don't need seats.

For SaaS companies, this isn't a feature request. It's an architectural decision. Your product needs to be agent-accessible, and agent-accessible means more than exposing an API. It means exposing the workflow knowledge that makes your API useful. This is what it means to make SaaS AI-ready -- not bolting on a chatbot, but exposing validated workflow knowledge through a standard protocol.

Building that from scratch -- extracting workflows, validating them, running the MCP infrastructure, handling auth, managing execution -- is months of engineering that doesn't ship a single feature your customers asked for. It's infrastructure. It doesn't differentiate your SaaS; it enables agents to access the differentiation you've already built.

That's what an automation layer looks like. Not another AI feature. Infrastructure that makes your existing product work in a world where agents are doing the work. If you want to understand the technical details of how MCP enables this, we break down the protocol and the code execution pattern in MCP explained: why your agent should write code, not make tool calls.

If you're building a SaaS product that needs to be agent-accessible, Hintas extracts your workflow knowledge from your existing API specs and docs, validates it, and deploys it as a managed MCP server. Request early access.

AI automation is broken. Here's what actually works.

why AI automation fails at multi-step tasks

the knowledge gap, not the intelligence gap

why MCP wrappers don't become SaaS automation

what workflow automation actually requires

how Hintas builds the automation layer

making SaaS AI-ready

Comments

More from this blog

MCP explained: why your agent should write code, not make tool calls

OpenAI bought a podcast. That tells you more than any product launch.

Enterprise AI ROI: stop measuring prompts, start measuring workflows

Anthropic vs. the Pentagon: the First Amendment case that will define AI ethics for a decade

Command Palette

why AI automation fails at multi-step tasks

the knowledge gap, not the intelligence gap

why MCP wrappers don't become SaaS automation

what workflow automation actually requires

how Hintas builds the automation layer

making SaaS AI-ready

Comments

More from this blog