The Break Rate
Researchers tested AI's hottest protocol. It failed more than half the time. The companies building on it aren't telling investors.
Plaintext
February 15, 2026
In December 2025, Anthropic handed its most consequential creation to the world. The company's Model Context Protocol — MCP, the open-source wiring that lets AI agents connect to databases, execute code, send emails, and move money — was donated to the newly formed Agentic AI Foundation, a governance body that would steward it as an open standard. Within weeks, the biggest names in technology had lined up. Microsoft introduced an MCP server called MarkItDown and began adding MCP-compatible tooling alongside its Copilot connectors. Salesforce wired MCP into Agentforce, so a sales agent could pull a customer quote from one system, generate a contract in another, and file it in the CRM without a human touching a keyboard. By January, according to The New Stack and other trade outlets, Amazon's advertising division, OpenAI, and companies across ad tech had all announced MCP-compatible integrations. The trade press called MCP a "de facto standard," a "foundation technology." If you were building an AI agent that touched the real world, trade press argued it was increasingly likely you were building on MCP.
Then researchers started testing it.
On January 24, Narek Maloyan and Dmitry Namiot posted a paper to arXiv, the open-access preprint server. They had built a benchmark called MCPBench and pointed it at five MCP server implementations across 847 attack scenarios. The baseline attack success rate — before any defenses — was 52.8 percent. More than half the time, the attacks worked.
That was just the first paper. Over the next three weeks, seven more landed.
What Plaintext found: Between January 24 and February 12, eight independent research teams published studies probing the security of AI agent protocols and frameworks. Their findings converge on a set of numbers the industry has not reckoned with: mean failure rates of 29 to 51 percent across agent architectures, attack success above 85 percent when adversaries adapt their strategies, and existing defenses that either don't work or break the agents they're supposed to protect. Meanwhile, at least 124 companies have filed 10-Ks and 10-Qs with the SEC since October that mention agentic AI risks, but their disclosures are largely boilerplate and contain no quantified failure rates. As of February 15, the gap between what researchers have measured and what companies tell investors has no precedent — because, until three weeks ago, no one had measured these failure rates in a standardized way.
As of Feb. 15, web searches show no coverage of these measured findings in mainstream tech or business outlets — the Wall Street Journal, Financial Times, Bloomberg, The Verge, or Wired. Where coverage exists, it has focused on implementation-specific CVEs, not architectural break rates.
What MCP Promised
A chatbot takes a question and gives an answer. An agent takes a goal and goes to work. It reads your email, checks your calendar, queries a database, drafts a response, submits a purchase order, books a flight. Each action requires the agent to reach across a boundary — from the model's reasoning layer into real systems where consequences live. MCP is the protocol that governs those reach-throughs. It defines how an agent discovers available tools, requests permission to use them, and executes actions on external servers.
Anthropic open-sourced MCP in November 2024. Within a year, the enterprise software industry had embraced it as the interoperability layer it had always wanted — the thing that would finally let systems talk to each other through an AI intermediary. The advertising world saw automated media buying: agents negotiating and placing ad campaigns across platforms, no human intermediary required.
The assumption underneath all of it: that the protocol was secure enough to trust with real operations.
That assumption now has a number. Several numbers, in fact, from several independent labs. None of them are reassuring.
Three Numbers
The research that landed between January 24 and February 12 includes eight papers from different teams at different institutions. Rather than march through them sequentially, here are the three findings that define the problem and a brief chorus of supporting work that points in the same direction.
The baseline: 52.8 percent.
Maloyan and Namiot's MCPBench established the floor. Against stock MCP servers, with no special defenses, an adversary's attacks succeeded 52.8 percent of the time. The researchers also proposed a fix — MCPSec, a protocol extension that dropped the success rate to 12.4 percent. That's a dramatic improvement. But MCPSec exists only as a research prototype. No major platform has adopted it. The protocol the industry is shipping runs at the undefended number.
In a companion paper published the same day, Maloyan and Namiot surveyed 78 prior studies and catalogued 42 distinct attack techniques. Their most sobering finding: when attackers use adaptive strategies — adjusting their approach based on what defenses they encounter, the way any competent adversary would — success rates against state-of-the-art defenses exceeded 85 percent. Most defenses, they wrote, achieve less than 50 percent mitigation against sophisticated attacks.
The architecture-by-architecture breakdown: 29 to 51 percent.
Two weeks later, on February 7, Sai Puppala and seven co-authors published a framework called AgentFence. Where MCPBench tested the protocol layer, AgentFence tested the agents themselves — eight common architectures, fourteen categories of trust-boundary attacks. They measured what they called the mean security break rate, or MSBR: the share of attacks that got through.
Different teams use different yardsticks — AgentFence's MSBR measures something distinct from MCPBench's attack success rate under preset scenarios — but across both frameworks, the results cluster between roughly 30 and 50 percent. LangGraph, the most hardened architecture, held up best at 0.29 (±0.04). AutoGPT, widely used in early agent experiments, fared worst at 0.51 (±0.07).
The breakdown by attack type tells the operational story. Authorization confusion — where the agent is tricked into acting with permissions it shouldn't have — succeeded 54 percent of the time. Retrieval poisoning, where the data an agent pulls from external sources has been laced with hidden instructions, got through 47 percent. And denial-of-wallet, the attack class with perhaps the most immediate financial consequence, broke through 62 percent of the time.
Picture it concretely: a sales agent queries a customer database. The response contains a hidden instruction — invisible to the human user, legible to the model — that tells the agent to also query a pricing API, then a competitor-analysis API, then a market-data API, each one burning tokens and racking up cloud charges. By the time anyone notices, the agent has run up thousands of dollars in costs on the attacker's behalf. That's denial-of-wallet. Six out of ten attempts succeed.
Puppala and his co-authors did not respond to Plaintext's request for comment.
The defense that doesn't work: monitoring bypass.
Many companies have adopted what seems like a reasonable safeguard: a second AI model that watches the first one for signs of manipulation. Think of it as a security camera trained on the agent. On February 4, Jafar Isbarov and Murat Kantarcioglu published a paper that demonstrated why that camera has a blind spot.
They showed that frontier-scale monitor models — including those with 72 billion parameters — could be bypassed by attacker agents of similar or smaller capability. The method was disarmingly simple: instead of trying to fool the monitor directly, the attacker uses prompt injection to embed malicious instructions inside the agent's own behavior. The agent becomes the delivery mechanism. The monitor can't distinguish attack from normal operation because the malicious instructions are now woven into the agent's chain-of-thought — part of how the agent "thinks."
Their conclusion, stated in the paper, was direct: monitoring-based defenses are "fundamentally fragile regardless of model scale." Not limited by current capability. Fragile by design.
The chorus. Four other papers from the same three-week window reinforce these findings from different angles. AgentDyn, from a team at several U.S. universities, ran 60 tasks and 560 injection test cases and found that existing defenses are "either not secure enough or suffer from significant over-defense" — security tight enough to stop attacks also blocked legitimate operations, leaving the agents useless. The 4C Framework paper, from CSIRO's Data61 in Australia, argued that security approaches built for traditional software fail because they treat agents like servers instead of reasoning entities that interact, make decisions, and propagate trust. A protocol comparison paper examined MCP alongside Google's Agent2Agent, Agora, and ANP, identifying twelve protocol-level risks; in the authors' multi-server testbed, a compromised server injected a mid-call prompt that redirected tool execution to an unintended provider in a measurable share of batch tasks — the equivalent of a financial document being routed to an unauthorized endpoint because a malicious prompt looked like internal reasoning. And MalTool, from researchers at Duke and Stanford, generated more than 6,400 malicious tools — some standalone, some embedded in legitimate tools — and found that existing detection methods, including VirusTotal, showed "limited effectiveness." The full list of papers, with arXiv identifiers and publication dates, is in the evidence appendix.
Not Just Theory
If the papers provided measurements in controlled settings, real-world vulnerability disclosures from the same period provided something closer to proof of concept.
In the summer of 2025, a security firm called Cyata — credited in subsequent coverage by SecurityWeek, CSO Online, and Infosecurity Magazine — reported three vulnerabilities in Anthropic's own MCP server implementation: mcp-server-git, the tool that lets AI agents interact with Git code repositories. One allowed unrestricted initialization of Git repositories anywhere on the host file system. Another enabled argument injection through the git diff command. A third bypassed path validation, potentially giving an attacker access to files outside the intended directory. All three could be triggered via prompt injection — a malicious instruction hidden in data the agent processes. All three carried CVSS severity scores between 6.3 and 6.5.
Anthropic accepted the reports in September 2025 and shipped patches in December — roughly six months after disclosure. That timeline is not unusual for the software industry. But it means that for six months, the reference implementation of the protocol Anthropic invented and the industry was adopting was exploitable via the protocol's most well-known weakness. Neither Anthropic nor the researchers reported exploitation in the wild during that window. The risk was latent — but it was real, and it was in the canonical implementation.
It wasn't just Anthropic. In January, according to a disclosure reported by Dark Reading, researchers found a server-side request forgery vulnerability in Microsoft's MarkItDown MCP server — the kind of flaw where a crafted URL could cause the server to fetch internal resources, potentially including cloud metadata endpoints that expose credentials and configuration data. The researchers went further: they analyzed more than 7,000 MCP servers and estimated that roughly 36.7 percent — more than 2,500 servers — harbored the same class of SSRF exposure. The vulnerability wasn't unique to one implementation. Researchers estimated roughly 36.7 percent of observed MCP servers showed a similar SSRF exposure — a recurring implementation pattern in how many MCP servers handle external requests, not necessarily a flaw embedded in the protocol specification itself.
Google's Gemini wasn't immune either. As reported by The Hacker News and SecurityWeek in January, researchers demonstrated that attackers could steal calendar data from Gemini users by embedding malicious instructions in Google Calendar invitations. When the AI processed the invite, it followed the hidden instructions and exfiltrated private meeting data. Google confirmed and addressed the vulnerability, catalogued across four CVEs.
All of these attacks are variations on the same theme: prompt injection. A malicious instruction smuggled into data that an AI agent reads and acts on. Major AI labs have cautioned publicly that prompt injection is unlikely to have a complete technical solution for web-browsing agents — a point OpenAI and others have acknowledged in various forms throughout 2025, though pinning down a single definitive statement is difficult because the concession tends to surface in technical discussions, blog posts, and developer forums rather than in a clean press release. The eight papers published in January and February now attach specific failure rates to that broad admission.
What the Filings Say
Plaintext searched the SEC's EDGAR full-text system for 10-K and 10-Q filings between October 1, 2025, and February 15, 2026, containing the word "agentic" alongside "risk" and "security." The search returned 124 filings. A narrower query — "agentic" with "risk factor" — returned 37. We reviewed a sample of filings from companies building on or integrating agentic AI, including HubSpot, Microsoft, and Salesforce. The risk-factor language across these filings is careful, forward-looking, and largely interchangeable from one company to the next. It covers the category of risk — AI features may be vulnerable, outputs may be unexpected, adversarial attacks are possible. It does not describe the magnitude.
What the filings don't contain: any reference to measured failure rates. No mention of break rates in the range of 29 to 62 percent. No acknowledgment that MCP has been shown, in controlled testing, to fail more than half the time under systematic attack. No discussion of the CVEs in Anthropic's own MCP server or the SSRF exposure rate across the broader server ecosystem. We did not find a counterexample in our sample — a filing with materially more specific language about agent security metrics — though we cannot rule out that one exists among the filings we did not review in full.
The timing context matters. All but a handful of the 124 filings were drafted before the January 24–February 12 research cluster landed. Companies can't disclose research that doesn't yet exist. But even among the filings submitted after February 3, Plaintext found none that included quantified failure rates or referenced emerging protocol-level security research, as of February 15.
There is no SEC rule requiring companies to disclose specific attack success rates. The SEC's 2023 cybersecurity disclosure rules require companies to describe their risk management processes and report material incidents, but they don't mandate disclosure of specific vulnerability metrics. The question — and it is genuinely open — is whether quantified failure rates of 29 to 62 percent for a protocol that a company has standardized on constitute a material risk that demands specific disclosure, or whether they remain the kind of technical detail that risk-factor boilerplate is designed to cover in the aggregate. That question will likely be answered by the SEC's Division of Corporation Finance, by courts, or by the first shareholder lawsuit filed after something breaks.
Plaintext sent detailed questions about MCP security, the January–February research findings, and disclosure practices to Anthropic, Microsoft, Google, OpenAI, Salesforce, and the Agentic AI Foundation on February 14 and 15. As of publication on February 16, none had responded.
Why the Architecture May Not Patch Away
One response to all of this would be: early software always has bugs; patches will come; the numbers will improve. To understand why that response may be too optimistic, it helps to see where the protocol's openings come from.
MCP includes a feature called sampling — it lets servers send prompts back to the AI model in the middle of a tool call. In principle, this enables sophisticated multi-step operations: a server can ask the model to reason about intermediate results before proceeding. In practice, it means a compromised server can plant instructions that look like part of the agent's own thinking. In the protocol comparison paper's case study, this mechanism allowed tool execution to be redirected to the wrong server entirely — the equivalent of an agent routing a financial document to an unauthorized endpoint because a malicious server's prompt looked like internal reasoning.
In multi-server configurations, trust propagates implicitly. If an agent trusts Server A, and Server A invokes Server B, the agent effectively trusts Server B without ever evaluating it. And servers can claim permissions without cryptographic proof, because the protocol doesn't require what security engineers call capability attestation.
These aren't bugs fixable by the next point release. They are design choices that make agents useful — the ability to chain tools, pass context between servers, and execute complex workflows is the whole point. The papers converge on a structural conclusion: the features that make modern agent protocols work are, architecturally, the same ones that make them breakable. Hardening them means limiting what agents can do. The MCPSec extension that dropped attack success to 12.4 percent works, in part, by restricting the very flexibility that made MCP attractive.
OWASP, the security standards body, acknowledged this tension when it published its "Top 10 for Agentic Applications 2026" — a formal recognition that agents require their own threat taxonomy, distinct from traditional web applications or APIs. The National Institute of Standards and Technology issued a request for information in January, still gathering public input on how to think about agent security. These are the first institutional motions toward a response. They are months behind the deployment curve. NIST is asking questions. Salesforce is already selling.
Three Million Agents
A vendor-sponsored survey of 750 IT executives, conducted by Gravitee and Opinion Matters in December 2025, put some scale on the deployment. The survey estimated roughly three million AI agents operating within corporations. Fifty-three percent were not actively monitored or secured. Eighty-eight percent of respondents reported experiencing or suspecting an AI agent–related security or data privacy incident in the past twelve months. As with any vendor-sponsored survey, these numbers should be treated as directional rather than precise — the respondents are self-selected, the definitions of "agent" and "incident" are the survey's own, and Gravitee sells API management tools that benefit from security anxiety.
Still, the order of magnitude lines up with what companies are saying publicly. Major banks have discussed large AI investments — JPMorgan Chase has cited billions of dollars in AI spending, and Goldman Sachs has described significant productivity gains from coding agents — but the details of their agent architectures and security postures are not public. The defense sector has explored agent systems, a subject covered by Defense News and others, but the scope and scale of those deployments are classified or undisclosed. What's clear from the survey, from public statements, and from the adoption announcements is that agent deployment has moved past the pilot stage. These are production systems, in industries that handle sensitive data and consequential decisions, built on protocols with the failure rates the papers describe.
Set those numbers against the research. Millions of agents. More than half unmonitored. Break rates between 29 and 62 percent. The defense that seems most intuitive — a second AI watching the first — "fundamentally fragile." The protocol extension that actually works exists only as a research prototype.
This story has real limits. The comparison between academic break rates and production environments is imperfect. Lab conditions differ from enterprise deployments. Some companies may have proprietary defenses the papers didn't capture. The AgentFence measurements cover eight agent archetypes, not every deployed system. MCPBench tested five MCP server implementations, not the full ecosystem. And a generic SEC filing doesn't mean a company is ignoring the problem. Security teams at Microsoft, Salesforce, and Anthropic may well be aware of these papers and building mitigations. None of the companies we contacted chose to say so.
What Comes Next
The next earnings cycle begins in April. Companies that have adopted MCP and deployed agentic AI systems will face analysts' questions about their AI strategies. Whether any analyst asks about measured break rates depends, in part, on whether the research sitting on arXiv reaches the people who read SEC filings.
The Agentic AI Foundation, which now stewards MCP, has not announced a security audit or a timeline for addressing the architectural vulnerabilities catalogued across the February papers. NIST's request for information is open, but standards take years. The eight papers sit in the public record — from independent teams, posted in 22 days, each measuring a piece of the same problem.
In the February 11 protocol comparison paper, the authors identified twelve distinct risks in the protocol layer that MCP shares, in varying degrees, with its competitors. Those risks are structural, not incidental. The MCPSec proposal that cut attack success to 12.4 percent proves that hardening is possible. The fact that no platform has adopted it, three weeks after the paper was published, proves that possible and actual are different things.
As of publication, the measured break rates haven't changed. The next MCP server is being deployed right now.
Evidence Appendix
Academic Papers (January 24 – February 12, 2026)
All papers verified via arXiv API with matching IDs and publication dates, as of February 15, 2026.
| Paper | arXiv ID | Published | Key Finding |
|---|---|---|---|
| MCPBench | 2601.17549 | Jan 24 | 52.8% baseline attack success across 847 scenarios on 5 MCP servers; MCPSec reduces to 12.4% |
| SoK (Systematic Analysis) | 2601.17548 | Jan 24 | Meta-analysis of 78 studies; 42 attack techniques; >85% success with adaptive strategies |
| 4C Framework | 2602.01942 | Feb 2 | Multi-agent security framework across Core, Connection, Cognition, Compliance dimensions (CSIRO Data61) |
| AgentDyn | 2602.03117 | Feb 3 | 60 tasks, 560 injection cases; existing defenses inadequate or over-defensive |
| Agent-as-a-Proxy | 2602.05066 | Feb 4 | Frontier-scale monitors (72B params) bypassed by similar-capability attacker agents |
| AgentFence | 2602.07652 | Feb 7 | MSBR 0.29±0.04 (LangGraph) to 0.51±0.07 (AutoGPT); Denial-of-Wallet 0.62±0.08 |
| Protocol Comparison | 2602.11327 | Feb 11 | 12 protocol-level risks across MCP, A2A, Agora, ANP; wrong-provider execution quantified |
| MalTool | 2602.12194 | Feb 12 | 1,200 standalone + 5,287 embedded malicious tools; existing detection shows limited effectiveness |
AgentFence Detailed Results (arXiv 2602.07652)
By Attack Class (Mean Security Break Rate):
| Attack Class | MSBR |
|---|---|
| Denial-of-Wallet | 0.62 ± 0.08 |
| Authorization Confusion | 0.54 ± 0.10 |
| Retrieval Poisoning | 0.47 ± 0.09 |
| Planning Manipulation | 0.44 ± 0.11 |
By Agent Architecture:
| Architecture | MSBR |
|---|---|
| AutoGPT | 0.51 ± 0.07 (highest) |
| LangGraph | 0.29 ± 0.04 (lowest) |
Fourteen trust-boundary attack classes evaluated across eight agent archetypes. Authors: Sai Puppala and seven co-authors.
CVE Disclosures
Anthropic mcp-server-git (Cyata research)
- CVE-2025-68143: Unrestricted git_init (CVSS 6.5)
- CVE-2025-68144: Argument injection in git_diff (CVSS 6.3)
- CVE-2025-68145: Path validation bypass (CVSS 6.4)
- Reported: June–July 2025. Accepted: September 2025. Patched: December 2025 (version 2025.12.18)
- Sources: Cyata research disclosure; SecurityWeek, CSO Online, Infosecurity Magazine (January 2026)
- No exploitation in the wild reported by Anthropic or researchers.
Google Gemini (reported by security researchers)
- CVE-2026-0612, CVE-2026-0613, CVE-2026-0615, CVE-2026-0616
- Calendar data theft via indirect prompt injection in calendar invitations
- Google confirmed and addressed.
- Sources: The Hacker News, SecurityWeek (January 2026)
Microsoft MarkItDown MCP Server
- SSRF vulnerability disclosed; researchers estimated 36.7% of 7,000+ analyzed MCP servers share same SSRF exposure class
- Source: Dark Reading (January 20, 2026)
SEC Filing Search Methodology (EDGAR)
Search conducted February 15, 2026.
Broad search: "agentic" AND "risk" AND "security" in 10-K/10-Q filings, October 1, 2025 – February 15, 2026 → 124 filings
Narrow search: "agentic" AND "risk factor" in 10-K/10-Q filings, same period → 37 filings
Sample filings reviewed:
- HubSpot 10-K: Accession 0001193125-26-046646 (filed February 11, 2026). Risk Factors section.
- Microsoft 10-Q: Accession 0001193125-26-027207. Item 1A, Risk Factors.
- Salesforce 10-Q: Accession 0001108524-25-000238. Risk Factors.
Other filers identified: Alphabet, Adobe, Mastercard, Visa, American Express, JPMorgan Chase, NVIDIA, Intel, Qualcomm, and dozens of additional technology and financial services companies.
Limitation: Plaintext reviewed a sample of filings, not the full text of all 124. The characterization of disclosure language as "largely boilerplate" reflects the sample reviewed.
MCP Adoption Sources
| Company | Integration | Source |
|---|---|---|
| Anthropic | Created MCP; donated to Agentic AI Foundation, December 2025 | Multiple outlets |
| Microsoft | MarkItDown MCP server; MCP-compatible tooling alongside Copilot connectors | Dark Reading (Jan 20, 2026) |
| Salesforce | Agentforce MCP integration | The New Stack (Jan 26, 2026) |
| OpenAI | MCP-compatible tooling in platform | The New Stack (Jan 26, 2026) |
| Amazon Ads | MCP-compatible server integration | MediaPost (Jan 27, 2026) |
| Yahoo, PubMatic, Magnite | Ad Context Protocol built on MCP | MediaPost (Jan 27, 2026), Adweek |
Note: Specific characterizations of the depth of each company's MCP integration vary across trade press sources. Plaintext describes these as "MCP-compatible integrations" except where a more specific characterization is directly supported by a named source. The description of Microsoft's integration is based on the Dark Reading report on MarkItDown and related tooling; Plaintext did not independently confirm the depth of MCP integration in Microsoft's Copilot product.
Enterprise Deployment Survey
- Source: Gravitee / Opinion Matters survey of 750 IT executives, December 2025; reported CSO Online February 4, 2026
- Methodological note: Vendor-sponsored, self-reported survey. Gravitee sells API management and security tools. Definitions of "AI agent" and "security incident" are the survey's own. Results should be treated as directional estimates.
- Key findings: ~3 million AI agents operating within corporations; 53% not actively monitored or secured; 88% of respondents reported experiencing or suspecting AI agent–related security or data privacy incident in past 12 months
Institutional Responses
- OWASP: Published "Top 10 for Agentic Applications 2026." Sources: Dark Reading, CSO Online.
- NIST: Request for information on AI agent security, January 2026. Information-gathering stage.
- Agentic AI Foundation: Established to steward MCP; no public security audit announced as of Feb. 15, 2026.
Expert Sources
Coverage Gap Verification
Web searches conducted February 15, 2026, for: "AgentFence" security, "MalTool" 2602.12194, "AgentDyn" benchmark, "Agent-as-a-Proxy" prompt injection, agent protocol comparison 2602.11327. No coverage found in mainstream tech or business outlets (WSJ, FT, Bloomberg, The Verge, Wired). Existing coverage focuses on implementation-specific CVEs (Anthropic, Google, Microsoft) and general agent security concerns, not the architectural break rates measured across the January–February paper cluster.
Company Outreach
Plaintext sent questions on February 14–15 to:
- Anthropic: Regarding MCPSec adoption status, mcp-server-git CVE timeline, Agentic AI Foundation security audit plans.
- Microsoft: Regarding MarkItDown SSRF vulnerability and MCP integration security posture.
- Google: Regarding Gemini CVEs and agent security measures.
- OpenAI: Regarding MCP-compatible tooling and prompt injection defenses.
- Salesforce: Regarding Agentforce MCP integration and agent security testing.
- Agentic AI Foundation: Regarding planned security review of MCP.
As of February 16 publication, none had responded. Standard response windows for corporate communications departments are typically 24–72 hours; Plaintext will update this article if responses are received.