← All topics
evaluation
1 brief
culture
2026-04-12
Berkeley Built a Bot That Aced Every AI Benchmark. Without Solving Anything.
Ten lines of Python beats SWE-bench. The entire evaluation industry just became a liability.