The AI That Hacked Everything

The AI That Hacked Everything
TechXplore

What launched / what broke

Anthropic launched Claude Mythos Preview in early 2026: a frontier model purpose-built to find offensive cybersecurity vulnerabilities. In controlled sandboxes from February to April it surfaced thousands of zero-day vulnerabilities, including kernel-to-root exploit chains on Linux, along with other long-standing bugs in widely deployed open-source software. Anthropic limited access to roughly forty hand-picked companies — Apple, Google, Microsoft, Amazon, Nvidia, Palo Alto Networks, CrowdStrike — inside a $100 million responsible disclosure program. According to iClarified and TechXplore, Treasury Secretary Bessent and Fed Chair Powell called an emergency meeting with bank CEOs to warn of systemic risk from AI-enabled exploits. The model was declared too dangerous for public release. The narrative broke when aisle.com published results showing that small, cheap, open-weights models, given the same isolated code snippets, replicated Mythos findings with near-perfect fidelity. Eight out of eight models spotted Mythos's flagship exploit. The methodology immediately split the community: tptacek argued the test misrepresented the challenge — feeding a model a suspected buggy snippet is not the same as discovering that bug inside millions of lines of real production code; community critics flagged what they described as a conflict of interest in the test design.

Anthropic pitched Mythos as a carefully gated national-security asset; the replication test suggests the confirmation step is already commoditized — though whether the harder autonomous discovery step is equally replicable remains genuinely contested.

What Nobody at the Company Can Say

Frontier labs have become de-facto regulators of offensive cyber capability. Giving the best hacking AI exclusively to the forty largest tech and finance incumbents entrenches their moat and raises the barrier for every startup and nation-state not on the list. The $100 million program is not charity; it is a toll booth. The methodological critique from tptacek and antirez is real — confirming a bug in an isolated snippet is not the same as finding it autonomously in production code. If small models can replicate Mythos on snippet tests, the access list argument changes; if they cannot replicate on real codebases, the access list is legitimate. The brief's sourcing leaves this central tension unresolved: the harder test tptacek set has not been run.

The Engineer Who Quit

Multiple researchers have reportedly left after seeing post-training decisions on Mythos, with some expressing that the team had been told they were building a tool to make infrastructure safer, but the moment the capability proved real the company turned it into a competitive-intelligence product for its highest-paying partners.

Who Pays

Independent security researchers and small cybersecurity firms

Immediate and ongoing

Losing the race to find and sell zero-days; now competing against both state actors and forty privileged companies with frontier-model assistance

Open-source maintainers and smaller tech companies

Over the next 6-12 months as patches propagate

Undisclosed bugs Mythos found remain unfixed for them while the forty chosen firms quietly patch and gain an operational edge

The broader internet ecosystem

Slow-burn; accelerates after any public model reproduces Mythos-class offense

Once the capabilities leak, every bank, hospital, and power grid faces a higher baseline threat level accelerated by selective release

Dead Pool Watch

Watch for the first credible demonstration of a sub-$1,000 open model autonomously chaining three of Mythos's Linux kernel exploits in a real, unprompted environment — not a snippet confirmation, but cold discovery in a multi-million-line codebase. That is the test tptacek set. If it passes, the only-we-can-be-trusted pitch is dead.

In 6 Months

Mythos stays controlled; the forty chosen companies quietly patch their products while the capability gap between frontier and open models holds

Signal No independent replication of Mythos-class exploits on real multi-million-line codebases by October 2026

Open-source reproduces Mythos-class autonomous discovery on commodity hardware; zero-day prices collapse; every nation-state gains the same power

Signal Any open model autonomously generating a working root exploit for a brand-new kernel vulnerability without being pointed at the relevant code

Governments impose export controls on offensive frontier models, turning Anthropic into a regulated utility

Signal Congressional hearings or executive orders specifically targeting offensive AI capability access lists

What Would Change This

The bottom line changes if a credible third party runs a blind test on a fresh million-line codebase and shows only the frontier model finds the bugs while small models miss them entirely.

Sources

TechXplore — Government alarm angle: Treasury and Fed convened emergency meeting with bank CEOs over systemic risk from Mythos
Aisle.com — Small open-weights models replicated Mythos findings on isolated code snippets; 8 out of 8 models found the flagship exploit
Calcalist Tech — Mythos access restricted to roughly 40 companies including Apple, Google, Microsoft, Amazon as part of a $100M cybersecurity initiative
iClarified — Washington systemic risk angle: Bessent and Powell called urgent meeting warning banks of potential cyber threats from AI-enabled exploits
iPrompt — Analysis of who wins and loses when offensive AI is given to defenders first: incumbent cybersecurity firms gain edge, independent researchers lose

Related