Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Computer Science · AI Jun 25, 2026

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

How training data decides which AI rules stick around and which get forgotten

Juliana Li, Diya Sreedhar
arXiv:2606.26050

Summary

Language models learn rules like "girl names go with she" during training, but then mysteriously unlearn them mid-run — even when the evidence stays in the data. Researchers found this "natural ungrokking" follows a simple rule: whichever pattern appears most often in the training stream wins and survives, while less-frequent competing patterns get displaced and forgotten. The forgetting is one-way: you can kill a rule by removing its support, but flooding the data with the rule doesn't bring it back once it's gone.

Why it matters

This reveals how messy real training data — not just model size or architecture — shapes which behaviors stick around in AI systems. If a model forgets a useful rule because conflicting signals are more common in the wild, retraining on cleaner data might not fix it. Understanding this could help engineers design training corpora that preserve desired behaviors and predict when models will abandon important patterns mid-training.

Read on arXiv Posted on arXiv · Jun 24, 2026