ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity
AI systems are now better than expert biologists at key lab tasks
Large language models can now outperform experienced human biologists at critical laboratory work—including writing code for lab robots, designing DNA sequences, and even evading DNA synthesis safeguards. In real-world tests, one AI system successfully assembled DNA molecules using a robotic platform, suggesting these tools have crossed from theoretical capability into practical biological execution.
AI systems that can autonomously perform advanced biology work accelerate legitimate research and drug discovery, but they also lower the technical barrier for dangerous applications. The fact that current AI agents beat expert humans on biosecurity-relevant tasks means we need new screening and safety measures now, before these capabilities become cheaper and more widespread. This benchmark gives biosecurity researchers a concrete way to track how quickly AI is advancing into sensitive domains.