PAPER PLAINE

Fresh research, simply explained. Updates twice daily.

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

Why AI systems built from multiple chatbots often break basic logic rules

When large language models are assembled into multi-part systems, each component can be internally consistent while producing outputs that violate fundamental probability rules when combined—a failure that occurs in one-third to nearly all component combinations in real systems. Researchers created a mathematical measure of this incoherence that can be calculated from a system's actual output, predicted its magnitude with 93% accuracy on most problem types, and demonstrated that standard fixes like better prompting or retrieval methods do not resolve the issue.

AI agents that make decisions by combining outputs from multiple language models—used in everything from medical diagnosis assistants to financial forecasting—can appear confident while producing logically impossible conclusions. The ability to measure and detect this failure at runtime means developers can catch these breakdowns before deployment, and the finding that typical mitigation strategies fail suggests the problem requires fundamental architectural changes rather than prompt engineering fixes.