PAPER PLAINE

Fresh research, simply explained. Updates twice daily.

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

Pairing quick AI reflexes with slow, careful thinking for better decisions

A hybrid system called PACT combines a fast, instinctive AI policy with a small language model that stops to think and plan. When the AI encounters unfamiliar situations, it calls on the language model to generate and test action plans before committing to them, dramatically outperforming either approach alone on difficult navigation tasks.

AI systems deployed in the real world—robots, autonomous vehicles, safety-critical systems—often fail when they encounter situations they weren't trained on. PACT shows that adding a deliberative planning step can catch and prevent these failures without retraining the core system, making existing AI safer and more reliable when conditions change unexpectedly.