LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
Why making AI models bigger sometimes makes them worse
Large language models stop improving and sometimes get worse when you scale them up without careful balance—much like how adding noise to a radio signal eventually drowns out the message. Researchers applied Shannon's information theory, which originally explained how much data can travel reliably through noisy communication channels, to model training and found it predicts this counterintuitive breakdown far better than existing scaling laws.
Teams building AI models currently spend billions scaling up compute and data assuming bigger always means better. This framework shows there's a ceiling—a signal-to-noise ratio threshold—beyond which throwing more resources at training actually degrades performance. The predictions hold up across different model sizes and perturbations, which means practitioners can now estimate where that threshold lies before wasting compute, and researchers have a principled way to understand when and why scaling strategies fail.