ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
One special word that lets AI think visually without slowing down
Researchers created ATLAS, a system where a single special word acts as both a visual reasoning step and an executable operation, eliminating the computational waste of generating intermediate images. The approach outperforms existing methods on visual reasoning benchmarks while remaining compatible with standard AI training techniques.
Current AI systems that reason about images either generate entire intermediate pictures (expensive and slow) or use hidden calculations that don't generalize well. ATLAS cuts through this tradeoff by embedding visual reasoning into a single token that's processed like normal text, making visual reasoning faster and more practical to deploy. This could meaningfully reduce the computational cost of AI systems that need to understand images and work through complex visual problems step-by-step.