Claude Mythos and the Anthropic Leak: What the Capybara Incident Reveals About Frontier AI Security

On March 26, 2026, security researchers Roy Paz and Alexandre Pauwels of the University of Cambridge discovered that Anthropic had left nearly 3,000 internal assets in a publicly searchable data store, no authentication required. Among them was a draft blog post announcing a new AI model called Claude Mythos, internally tiered as Capybara. Fortune reviewed the documents and notified Anthropic, after which the company restricted access and attributed the incident to human error in its content management system configuration.
Five days later, Anthropic published the complete source code of its Claude Code CLI tool to npm. The root cause was a missing .npmignore file that allowed a 59.8 MB source map to be shipped alongside the production package, linking to a Cloudflare R2 storage bucket with roughly 512,000 lines of unobfuscated TypeScript. Within hours, the repository had been mirrored across GitHub with over 84,000 stars before Anthropic could issue DMCA takedowns.
Two separate disclosures in less than a week. Both caused by configuration errors. Both at one of the best-funded AI safety companies in the world. That combination deserves careful analysis, not just as an operational failure, but as a signal about the infrastructure pressures that accompany frontier model development and what they mean for teams building on top of these systems.
What the Leaked Draft Actually Said
The draft blog post was structured as a standard product announcement. It described Capybara as a new model tier sitting above Anthropic's existing Opus line- larger, more capable, and more expensive to serve. According to the draft, the model achieved dramatically higher scores than Claude Opus 4.6 on software coding, academic reasoning, and cybersecurity benchmarks, though no specific numbers were included.
The most significant detail was not the benchmark framing. It was Anthropic's own risk language. The document stated that the model is "currently far ahead of any other AI model in cyber capabilities" and warned that it "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." Anthropic's planned rollout strategy reflected that concern directly: early access would be granted first to cyber defenders, giving them a head start before broader availability.
OpenAI has made parallel disclosures. Under its Preparedness Framework, GPT-5.3-Codex was classified as the first model with "high capability" for cybersecurity tasks, and the model was explicitly trained to identify software vulnerabilities. The convergence of these disclosures suggests that the industry has crossed a threshold where offensive cyber capability is now a primary characteristic of frontier models, not an edge case.
The Misconfiguration as a Security Class
The CMS incident at Anthropic was not a sophisticated attack. The content management system was configured to make uploaded assets public by default. That default was never changed for the assets in question. The fix, once identified, was immediate. No credentials were stolen, no lateral movement occurred, and no malicious actor was involved.
What makes it instructive is its typicality. Default-public storage configurations are one of the most commonly exploited misconfiguration patterns in cloud environments. The OWASP Top 10 has listed security misconfiguration in the top five since 2013. The NIST SP 800-53 control family CM-6 specifically addresses configuration settings and the risks of permissive defaults. None of that institutional knowledge prevented it here.
The Claude Code npm incident follows an equally well-documented pattern. Shipping debug artifacts to a production registry through a misconfigured build pipeline is a known supply chain risk class. The SLSA framework and NIST SP 800-218 both address this at the build provenance layer. The failure was not a gap in knowledge about what good looks like. It was a gap in enforcement at the pipeline level.
For security teams evaluating AI vendors, both incidents point to the same underlying question: what automated gates exist between internal tooling and public-facing infrastructure? At Anthropic, the answer in both cases appears to have been insufficient pre-publication checks and a reliance on manual opt-out rather than opt-in controls.
The Dual-Use Problem at Frontier Scale
The content of what leaked matters as much as the fact of the leak. Anthropic's own draft framed Capybara's cybersecurity capabilities as a risk that warranted extra caution beyond standard internal testing. That is an unusual posture for a product announcement. It reflects something the security community has been tracking for several years: as model capabilities scale, the gap between what a model can do for defenders and what it can do for attackers narrows, and then disappears.
Anthropic had already documented this in practice. In November 2025, the company published a report confirming that a Chinese state-sponsored group had used Claude's agentic capabilities to infiltrate roughly 30 global targets. The attackers leveraged the model's ability to automate multi-step reconnaissance and lateral movement tasks- precisely the kind of capability that Capybara is described as extending further.
NIST's AI Risk Management Framework categorizes dual-use risk under the GOVERN function, requiring organizations to identify where their AI systems can cause harm even when operating as intended. A model that scores dramatically higher on cybersecurity benchmarks than any prior system creates a measurable shift in that risk calculus. The question is not whether the capability exists but who controls access to it and under what conditions.
Anthropic's decision to release to cyber defenders first is a governance decision as much as a product one. It acknowledges that the rollout sequence itself is a security control. Defenders get time to understand the model's capabilities, build detection and mitigation tooling, and stress-test existing infrastructure before attackers have the same access. The effectiveness of that head start depends entirely on how well the
access boundary is maintained during early access which is precisely the kind of operational control that the CMS incident suggests requires more rigorous enforcement.
What This Means for Teams Running AI Infrastructure
For security leaders and AI engineers operating production agent systems, the Anthropic incidents surface three practical concerns.
The first is vendor operational security as a dependency risk. When a model provider leaks its own internal roadmap through a CMS misconfiguration, that is a signal about the maturity of its internal security controls. It does not mean the model itself is compromised, but it does mean that assumptions about the provider's operational hygiene should be treated as an input to your own risk model, not taken for granted.
The second is the supply chain exposure created by AI tooling at scale. The Claude Code npm incident is structurally identical to supply chain attacks that have affected widely used open-source packages. The difference is that Anthropic's disclosure was unintentional. The concern for teams using AI-generated code in production- which NeuralTrust's model code scanner is designed to address- is that the same exposure pathways that allowed Anthropic's source code to reach the public registry exist in any build pipeline that lacks signed provenance and automated artifact validation.
The third concern is what frontier cyber capability means for existing defenses. If the leaked characterization of Capybara is accurate- that it can identify and exploit vulnerabilities at a pace that outstrips defenders- then the threat surface for organizations running AI-adjacent infrastructure changes materially at the point of the model's broader release. Red teaming against current model generations gives some preparation, but the capability step Anthropic describes would require updated adversarial simulation to be meaningful.
Governance Controls That Apply Now
Organizations do not need to wait for Mythos to reach general availability to act on what the leak revealed.
Configuration management audits on AI vendor integrations are overdue at most organizations. The same default-public patterns that exposed Anthropic's CMS data are common in cloud storage configurations, CI/CD pipeline outputs, and model serving infrastructure. An inventory of where AI-related assets are stored, what their default access levels are, and who has authority to change them is a foundational step.
For teams using AI coding tools in production pipelines, the Claude Code incident is a concrete illustration of why artifact signing and pre-publish checks cannot be optional. The SLSA framework provides a structured maturity model for build integrity that applies directly to this class of risk. Integrating dependency scanning into CI gates- before artifacts reach any registry- is the control that would have caught the npm misconfiguration before it became public.
At the model governance layer, the rollout strategy Anthropic described for Capybara-sequenced access, starting with defenders- is a useful reference point for internal AI deployment policies. When your organization deploys a new model version or extends an agent's tool access, who gets access first, under what monitoring conditions, and what gates exist before broader rollout? NeuralTrust's behavioral threat detection provides the observability layer that makes sequenced rollouts governable rather than aspirational.
The NeuralTrust AI Gateway exists precisely for this kind of enforcement- controlling which models, tools, and data sources an agent can reach, and ensuring that access policy changes are audited rather than implicit.
The Structural Issue Behind Both Incidents
The CMS leak and the npm leak share a root cause that goes beyond individual configuration mistakes. Both occurred at the boundary between an internal development environment and a public-facing system. Both were caused by permissive defaults that required an explicit action to override. Both were discovered by external researchers rather than internal controls.
That pattern- external discovery of internally-created exposure- is a signal that the monitoring layer between development and production is not catching the right signals. At the speed Anthropic ships roughly 30 model-related releases between February and March 2026 the margin for manual review at each boundary is narrow. Automated enforcement of access defaults, artifact validation, and configuration drift detection are not optional at that velocity. They are the only way to maintain consistent security posture when the release cadence itself generates risk.
The broader lesson is not specific to Anthropic. Any organization building or deploying frontier AI systems faces the same pressure: capability development moves faster than security review, and the systems being built have capabilities that make their own operational security more consequential than it would be for conventional software. The incidents in late March 2026 are a concrete illustration of what happens when that pressure is not matched with equally rigorous infrastructure controls.
For teams evaluating their own readiness, the NeuralTrust Agentic AI Security Framework covers the control layers that apply across model deployment, tool access, and observability.













