Claude Mythos: Dangerous AI Already Breached

Q: The Last Ones: A 32-Step Network Takeover

The UK’s AI Security Institute (AISI) independently evaluated Mythos Preview, and their findings were sobering. They built a simulation called “The Last Ones” (TLO): a 32-step corporate network attack spanning from initial reconnaissance to full network takeover — a task estimated to take human experts many hours, days, or weeks.

What Is Claude Mythos?

On April 7, 2026, Anthropic made an announcement unlike anything in the history of artificial intelligence: it had built a model so powerful and so dangerous that it would not be releasing it to the public. That model is Claude Mythos Preview — a general-purpose language model that sits above even Claude Opus in Anthropic’s lineup, and which has demonstrated an unprecedented ability to autonomously find and exploit software vulnerabilities.

Unlike ChatGPT, Claude, or any other consumer AI, Mythos was never intended for wide release. Instead, Anthropic quietly offered access to a handpicked consortium of dozens of corporations and critical infrastructure operators under an initiative called Project Glasswing — a defensive program meant to use Mythos’s capabilities to find vulnerabilities before malicious actors do.

The Project Glasswing consortium reads like a who’s who of the tech and finance world: Amazon, Apple, Google, Cisco, CrowdStrike, JPMorgan Chase, Microsoft, and Nvidia, among others. Notably absent is OpenAI, reportedly around six months behind Anthropic in developing a comparable model.

What Makes It So Powerful?

The capabilities Anthropic has disclosed — and that independent evaluators have confirmed — are nothing short of extraordinary.

Zero-Day Vulnerability Discovery at Scale

Mythos has reportedly identified thousands of high-severity zero-day vulnerabilities across every major operating system and web browser. Zero-days are previously unknown security flaws — the most prized and dangerous type of vulnerability in cybersecurity. Until now, discovering them required highly specialized human experts. With Mythos, Anthropic engineers with no formal security training can reportedly ask the model to find remote code execution vulnerabilities overnight.

Among its most striking finds: a 27-year-old vulnerability in OpenBSD and a 17-year-old remote code execution flaw in FreeBSD — bugs that had existed, undetected, for decades in systems trusted for their security reliability. One flaw was found in a line of code that had been tested five million times without detection. At the time of the April 7 announcement, 99% of the discovered vulnerabilities remained unpatched.

Autonomous Exploit Writing

According to Anthropic’s red-team blog, the company provided Mythos Preview with a list of 100 CVEs and memory corruption vulnerabilities from the Linux kernel. The model filtered these down to 40 potentially exploitable ones, then autonomously wrote working privilege escalation exploits — without any human intervention after the initial prompt. More than half of these attempts succeeded.

The Last Ones: A 32-Step Network Takeover

The UK’s AI Security Institute (AISI) independently evaluated Mythos Preview, and their findings were sobering. They built a simulation called “The Last Ones” (TLO): a 32-step corporate network attack spanning from initial reconnaissance to full network takeover — a task estimated to take human experts many hours, days, or weeks.

Claude Mythos Preview is the first AI model ever to complete TLO from start to finish, accomplishing it in 3 out of 10 attempts. Across all attempts, it completed an average of 22 out of 32 steps. The next best model, Claude Opus 4.6, averaged only 16 steps.

On expert-level capture-the-flag (CTF) security challenges — tasks that no AI model could complete before April 2025 — Mythos succeeded 73% of the time.

The AISI did note limitations: Mythos could not complete an operational technology (OT) focused range called “Cooling Tower,” and its success was demonstrated against smaller, weakly defended systems. “This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems,” the institute noted.

Project Glasswing: A Race Against the Clock

Rather than release Mythos publicly, Anthropic launched Project Glasswing as an emergency defensive consortium. The idea is straightforward but ambitious: use Mythos to find and patch vulnerabilities before attackers can exploit them.

The scope of the problem is staggering. Anthropic has found flaws in systems that are “10 or 20 years old,” suggesting that decades of accumulated technical debt in critical infrastructure could be exposed. The race to patch thousands of newly discovered zero-days — 99% of which were undefended at launch — has become a matter of national security urgency.

U.S. Treasury Secretary Scott Bessent convened a meeting of senior American bankers in Washington in April to discuss the Mythos threat, encouraging executives to use the model to detect vulnerabilities in their own systems. Major financial institutions — Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley — are reportedly already testing it. Meanwhile, Axios reported that the National Security Agency (NSA) is also using Mythos.

The Breach: How the “Unhackable” AI Was Accessed Within 24 Hours

Here is where the story takes a sharply ironic turn.

On April 21, 2026 — just two weeks after Anthropic announced the model it said was too dangerous to release — Bloomberg broke a bombshell report: a private Discord group had gained unauthorized access to Claude Mythos Preview on the very day of its launch.

How They Did It

The breach was not a sophisticated hack. It was a combination of insider access and clever deduction:

A third-party contractor credential: A worker at an unnamed third-party vendor with authorized access to Mythos shared credentials with members of the Discord group.
A URL pattern guess: The group used internet sleuthing tools commonly used by cybersecurity researchers. They also leveraged information from a separate data breach at Mercor, an AI training startup that partners with several frontier AI labs, to guess the URL format Anthropic uses for its models.
The result: The group correctly identified the online location of Mythos Preview and has reportedly been using it freely ever since April 7.

The Discord group in question uses bots to monitor GitHub for information about unreleased AI models. Bloomberg confirmed the apparent breach by reviewing screenshots and watching a live demonstration provided by a member of the group.

Anthropic’s Response

“We’re investigating a report claiming unauthorized access to Claude Mythos Preview through one of our third-party vendor environments,” an Anthropic spokesperson said. The company stated there is currently no evidence that the unauthorized activity impacted Anthropic’s own systems or extended beyond the third-party vendor’s environment.

The anonymous source who spoke to Bloomberg claimed the group was “interested in playing around with new models, not wreaking havoc with them” — describing activities like “vibe coding tests.” But as analysts have pointed out: intent is not capability.

The Debate: Watershed Moment or Marketing Hype?

Not everyone is convinced Mythos represents an existential cybersecurity inflection point. Critics have raised the possibility that Anthropic — a company that has built its brand on being the most safety-conscious AI lab — may be amplifying the threat to generate interest in a limited commercial release.

“There is an element of marketing charm with it — it’s certainly got a lot of attention, and creating a limited release is one way to really get people charged up and excited about something,” said Joe Saunders, CEO of cybersecurity firm RunSafe Security, speaking to Foreign Policy.

The AISI’s qualifications are also worth remembering: Mythos succeeded against simulated, weakly defended environments — not real-world hardened systems. Its failure on the OT-focused Cooling Tower range suggests meaningful limitations.

Still, the independent weight of the AISI’s data — and the speed with which governments, banks, and intelligence agencies have mobilized — suggests there is real substance behind the headlines. As Turing Award winner Yoshua Bengio warned at the end of 2025, the ability of AI to autonomously discover zero-day vulnerabilities was a threshold he feared would eventually be crossed. With Mythos, it appears it has been.

What Comes Next?

OpenAI announced a similarly limited rollout of a cybersecurity-focused model just one week after Anthropic’s Mythos announcement, suggesting the competitive landscape is accelerating. The CFR notes OpenAI is approximately six months behind Anthropic in this race.

The breach, meanwhile, has exposed a fundamental contradiction at the heart of the Mythos project: Anthropic built an AI too dangerous to release, then struggled to control access to it within 24 hours of its launch. If the most safety-conscious AI lab in the world cannot keep its most dangerous model contained, it raises uncomfortable questions about what happens when the next lab — or the one after that — builds something comparable.

For now, the world’s most powerful hacking AI is being investigated by the company that built it, accessed by an anonymous Discord group with unknown intentions, and depended upon by governments and banks to defend the very infrastructure it could theoretically destroy.