PAPER PLAINE

Fresh research, simply explained. Updates twice daily.

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

Teaching AI to pay attention using pure geometry instead of learned rules

A new attention mechanism for AI treats tokens as geometric transformations—rotations, reflections, shearing—rather than vectors with learned features. The system scores relationships using intrinsic distance between these transformations, not learned kernels, and handles complex geometric groups (like rotations in 3D space or 2D affine transformations with scaling) that existing methods cannot. In tests on sequence completion, it matched learned approaches with 50–80 times fewer parameters and broke no geometric rules, while standard vector-based attention failed by trillions of times over.

Most AI attention mechanisms are built on learned, data-dependent rules that can violate the geometric structure they're meant to preserve. This construction builds attention directly from mathematical geometry, guaranteeing that transformations remain valid by design rather than by luck. That matters for any system working with structured spatial data—robotics, 3D vision, medical imaging, physical simulations—where breaking geometric consistency causes failures downstream.

Topological Codes Based on Space Groups

Building quantum error-correction codes with less repetitive structure

Researchers expanded how to build topological codes—a leading approach to protecting quantum computers from errors—by relaxing the requirement that they repeat perfectly across space. The new codes combine translation symmetry with rotations and reflections, and surprisingly, they can require fewer qubits in practice than the standard designs, making them simpler to build.

Quantum computers remain fragile, and error correction is essential before they can solve real problems. This work expands the toolkit for designing error-correcting codes that fit better with actual quantum hardware, potentially reducing the number of physical qubits needed to run a reliable quantum computer.

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

Why AI investments fail without developing workers' ability to use them

Massive spending on artificial intelligence hasn't delivered expected productivity gains because companies deploy AI without first building workers' capacity to actually use it effectively. A new framework shows that the match between AI availability and what researchers call "convergence capacity"—a combination of practical understanding, self-awareness, flexible thinking, and ability to connect ideas—accounts for 86% of productivity differences across wealthy nations, compared to just 31% for AI deployment alone.

Countries and companies are pouring billions into AI tools that sit underutilized because workers lack the cognitive skills to integrate them into their jobs. South Korea exemplifies the problem: despite strong workforce education and significant AI investment, low convergence capacity means minimal actual productivity gain. The framework suggests that before buying more AI, organizations need to invest in training that builds workers' ability to learn across domains, think flexibly, and adapt—a shift that could unlock trillions in stranded AI value currently going unrealized.

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

How flawed AI judges infect each other's decisions in multi-agent systems

When AI language models evaluate each other's work in team settings, their biases spread from one agent to the next—even when they're the same model. Researchers found that biased evaluators cause contagion coefficients between 0.157 and 0.352, but adding just two more evaluators to the review process cuts this bias spread by 72%, offering a simple fix.

AI systems increasingly rely on other AIs to check their work. If one model's judgment bias infects the rest of the team, bad decisions compound across the entire network. This research shows you can dramatically reduce that contamination by using evaluation committees instead of single judges—a practical safeguard for any system where AI agents depend on each other's feedback.

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

A faster way to generate realistic 3D medical scans from scratch

Researchers built a new AI system that can create high-resolution 3D CT scans of the chest and lungs with fine detail intact, without the computational bottlenecks that slow down existing methods. The system works in two stages: first handling large-scale structures, then filling in subtle details—an approach that outperformed competing methods on standard medical imaging benchmarks.

CT scans are expensive and expose patients to radiation, so generating realistic synthetic ones could reduce both costs and unnecessary imaging in research and clinical training. A faster, more efficient generation method means hospitals could use synthetic scans to train AI diagnostic tools and practice rare cases without scanning additional patients. This could accelerate the development of more reliable medical AI while protecting patient privacy.

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

Teaching AI to explain economics using real data and tested theories

Researchers built an AI economist that generates economic reports and analyses by anchoring its claims to actual data and economic theory, rather than just producing plausible-sounding narratives. When tested on inflation forecasts and bank stress scenarios, the system produced more coherent and traceable explanations than language models working alone.

Economic analysis shapes real decisions—from Federal Reserve policy to bank lending rules—so explanations need to be trustworthy and defensible, not just fluent. This framework makes AI-generated economic reasoning transparent and checkable against actual models and evidence, reducing the risk of confident-sounding but unfounded claims influencing financial decisions.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

A handful of fashion and appearance cues drive how AI judges people

AI image models make sweeping social judgments about people based on surprisingly few visual signals—mainly clothing style, age, and body type. Researchers tested six major AI systems on 25,000 carefully controlled images where only one attribute changed at a time, finding that just 15 visual cues account for nearly 80% of all the biased judgments these models make.

These AI models are already screening job applicants, assessing loan eligibility, and making other high-stakes decisions about real people. If a model judges someone's trustworthiness or earning potential based primarily on their clothes or perceived age, it can systematize discrimination at scale. This benchmark gives developers a concrete way to test and fix these specific weak points before deploying systems in consequential settings.

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

AI that reasons through a patient's complete medical history to guide treatment decisions

Most medical AI answers isolated questions quickly but struggles when the real answer requires connecting facts scattered across patient records, images, and sensor data. MedRLM instead builds a dynamic "evidence map" that recursively searches through a patient's full medical picture—text notes, imaging, heart rhythms, blood pressure trends, and clinical guidelines—activating deeper analysis when abnormal patterns appear, then flags cases for human review when confidence is low.

Healthcare providers in rural or under-resourced areas often lack specialists to review complex cases. A system that can systematically extract and connect evidence across all available patient data, then decide whether a case needs referral to a tertiary hospital, could reduce delays in care and improve triage accuracy. The framework's built-in uncertainty checking also prevents overconfident recommendations that might lead clinicians astray.

CLUSTER: Derivative-free optimization of smooth functions with parameter-change costs

Speeding up lab experiments when moving between settings costs time and money

A new algorithm called CLUSTER optimizes laboratory experiments about 50% faster than existing methods when there's a penalty for adjusting each parameter or group of parameters—such as when a robot must physically reposition equipment. The approach works especially well for real-world lab setups like optics experiments, and outperforms popular alternatives like Bayesian optimization.

Robot-controlled labs waste time and resources repositioning equipment between every tiny parameter adjustment. CLUSTER reduces this waste by being smarter about which parameters to change together, cutting experiment time significantly. For labs running hundreds of optimization experiments—from drug discovery to materials science—this 50% speedup translates directly to faster results and lower costs.

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Testing whether AI coding assistants work equally well in twelve languages, not just Python

Researchers expanded a major AI coding benchmark from Python alone to twelve programming languages, revealing that large language models perform significantly worse in non-Python languages even on identical tasks. The evaluation of 24 models uncovered clear evidence that AI systems are overtrained on Python and struggle with language-specific code patterns.

Most programming benchmarks only test AI in Python, so companies have no reliable way to know whether these tools will work for their JavaScript, Java, C++, or Go codebases. This benchmark exposes real performance gaps that developers will encounter in practice, pushing AI model builders to create systems that actually generalize across the languages used in professional software development.

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

A security checkpoint that stops AI agents from making unauthorized changes to cloud systems

Autonomous agents controlling cloud infrastructure need a hard stop between decision and action. This paper introduces the Sovereign Execution Broker, a system that sits between an AI agent's proposed changes and the actual infrastructure, verifying that each change matches what was explicitly approved and hasn't been revoked—then recording exactly what happened. The authors tested it on AWS and Kubernetes clusters and found it adds minimal latency while catching unauthorized mutations.

As AI agents gain direct control over production systems, a single compromised or hallucinating agent could cause widespread damage before anyone notices. This broker creates a tamper-proof record and a mandatory verification point that can't be bypassed, letting companies revoke agent permissions instantly and audit every change. In regulated industries like finance and healthcare, having a signed, auditable trail of who authorized what change and when could be legally required.

Analysing drivers and interdependencies in European electricity markets using XAI

What actually drives electricity prices across Europe's interconnected power grid

Researchers used artificial intelligence to decode why electricity prices fluctuate across 39 European regions, revealing that solar power influences prices far more than its overall share of power generation would suggest. Gas prices remain the most consistent driver, and direct connections between countries' grids significantly reshape pricing in neighboring nations—showing how tightly Europe's electricity systems are now linked.

European governments and grid operators make billion-euro decisions about energy policy, transmission upgrades, and emergency reserves based on price forecasts. Understanding which factors actually move prices—rather than just predicting them—lets policymakers target the right levers: they might invest differently in solar storage if solar truly dominates price swings, or prioritize grid upgrades between countries if interconnections reshape regional economics. This analysis also shows what a genuinely unified European market would look like, crucial information as the EU pushes toward deeper energy integration.