The AI That Hacked Everything
What launched / what broke
Anthropic launched Claude Mythos Preview in early 2026: a frontier model purpose-built to find offensive cybersecurity vulnerabilities. In controlled sandboxes from February to April it surfaced thousands of zero-day vulnerabilities, including kernel-to-root exploit chains on Linux, along with other long-standing bugs in widely deployed open-source software. Anthropic limited access to roughly forty hand-picked companies — Apple, Google, Microsoft, Amazon, Nvidia, Palo Alto Networks, CrowdStrike — inside a $100 million responsible disclosure program. According to iClarified and TechXplore, Treasury Secretary Bessent and Fed Chair Powell called an emergency meeting with bank CEOs to warn of systemic risk from AI-enabled exploits. The model was declared too dangerous for public release. The narrative broke when aisle.com published results showing that small, cheap, open-weights models, given the same isolated code snippets, replicated Mythos findings with near-perfect fidelity. Eight out of eight models spotted Mythos's flagship exploit. The methodology immediately split the community: tptacek argued the test misrepresented the challenge — feeding a model a suspected buggy snippet is not the same as discovering that bug inside millions of lines of real production code; community critics flagged what they described as a conflict of interest in the test design.
Anthropic pitched Mythos as a carefully gated national-security asset; the replication test suggests the confirmation step is already commoditized — though whether the harder autonomous discovery step is equally replicable remains genuinely contested.
What Nobody at the Company Can Say
Frontier labs have become de-facto regulators of offensive cyber capability. Giving the best hacking AI exclusively to the forty largest tech and finance incumbents entrenches their moat and raises the barrier for every startup and nation-state not on the list. The $100 million program is not charity; it is a toll booth. The methodological critique from tptacek and antirez is real — confirming a bug in an isolated snippet is not the same as finding it autonomously in production code. If small models can replicate Mythos on snippet tests, the access list argument changes; if they cannot replicate on real codebases, the access list is legitimate. The brief's sourcing leaves this central tension unresolved: the harder test tptacek set has not been run.
The Engineer Who Quit
Multiple researchers have reportedly left after seeing post-training decisions on Mythos, with some expressing that the team had been told they were building a tool to make infrastructure safer, but the moment the capability proved real the company turned it into a competitive-intelligence product for its highest-paying partners.
Who Pays
Independent security researchers and small cybersecurity firms
Immediate and ongoing
Losing the race to find and sell zero-days; now competing against both state actors and forty privileged companies with frontier-model assistance
Open-source maintainers and smaller tech companies
Over the next 6-12 months as patches propagate
Undisclosed bugs Mythos found remain unfixed for them while the forty chosen firms quietly patch and gain an operational edge
The broader internet ecosystem
Slow-burn; accelerates after any public model reproduces Mythos-class offense
Once the capabilities leak, every bank, hospital, and power grid faces a higher baseline threat level accelerated by selective release
Dead Pool Watch
Watch for the first credible demonstration of a sub-$1,000 open model autonomously chaining three of Mythos's Linux kernel exploits in a real, unprompted environment — not a snippet confirmation, but cold discovery in a multi-million-line codebase. That is the test tptacek set. If it passes, the only-we-can-be-trusted pitch is dead.
In 6 Months
Mythos stays controlled; the forty chosen companies quietly patch their products while the capability gap between frontier and open models holds
Signal No independent replication of Mythos-class exploits on real multi-million-line codebases by October 2026
Open-source reproduces Mythos-class autonomous discovery on commodity hardware; zero-day prices collapse; every nation-state gains the same power
Signal Any open model autonomously generating a working root exploit for a brand-new kernel vulnerability without being pointed at the relevant code
Governments impose export controls on offensive frontier models, turning Anthropic into a regulated utility
Signal Congressional hearings or executive orders specifically targeting offensive AI capability access lists
What Would Change This
The bottom line changes if a credible third party runs a blind test on a fresh million-line codebase and shows only the frontier model finds the bugs while small models miss them entirely.
Related
Anthropic Hit $30B ARR. The Revenue Race Is Now About Lawyers.
dramaAnthropic Is 'Exploring' Custom Chips. That Is the Most Expensive Way to Say No.
cultureKarpathy Says There's a Growing Gap in AI Understanding. He Called It a Perception Problem. It's Actually a Pricing Problem.
moneyMichael Burry Is Betting $1 Billion That Anthropic Eats Palantir's Lunch. He's Half Right.