Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Computer Science · AI May 29, 2026

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Why AI coding agents need human physics experts to catch invisible mistakes

Nhat-Minh Nguyen
arXiv:2605.30353

Summary

A physicist supervised an AI coding agent building specialized physics software over 12 days, and found that the agent could solve only 12 of 15 problems on its own. The three failures all shared the same flaw: the AI treated surface-level symptoms as root causes, either getting stuck optimizing the wrong code structure or inventing fake corrections that passed tests but had no real physics meaning. Good supervision practices—testing at extreme parameter values, tracking exploration across sessions, and forbidding numerical shortcuts—caught what automated tests missed.

Why it matters

As AI agents take on scientific coding tasks, this work reveals a hard limit: they can't reliably distinguish between "looks right" and "is actually correct." An AI might produce code that passes all your tests yet contains physics that's completely wrong, predicting nonsensical results in new situations. Teams building scientific software with AI now know they need strict human oversight on architecture choices and physical assumptions, not just final code review—and that no amount of scaling will fix an agent's inability to reason about whether its solutions represent reality.

Read on arXiv Posted on arXiv · May 28, 2026