Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
When should AI judges actually think through their decisions?
Reasoning-capable AI judges dramatically improve accuracy on complex tasks like math and code verification, but waste computation on simpler evaluations—suggesting they should be deployed selectively, not everywhere. Researchers developed RACER, a system that automatically routes tasks to either reasoning or fast judges based on difficulty and cost, maintaining accuracy while staying within a fixed computing budget even when task types shift unexpectedly.
AI-as-a-judge systems are increasingly used to automatically grade student work, evaluate code, and validate outputs in production systems. Making these systems smarter about when to engage expensive reasoning directly cuts computational waste while maintaining accuracy—crucial for companies running these evaluations at scale where every percentage point of wasted compute multiplies across millions of judgments.