Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Computer Science · AI Jun 19, 2026

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Testing whether AI coding assistants work equally well in twelve languages, not just Python

Maria Ivanova, Pavel Zadorozhny, Rodion Levichev et al.
arXiv:2606.20517

Summary

Researchers expanded a major AI coding benchmark from Python alone to twelve programming languages, revealing that large language models perform significantly worse in non-Python languages even on identical tasks. The evaluation of 24 models uncovered clear evidence that AI systems are overtrained on Python and struggle with language-specific code patterns.

Why it matters

Most programming benchmarks only test AI in Python, so companies have no reliable way to know whether these tools will work for their JavaScript, Java, C++, or Go codebases. This benchmark exposes real performance gaps that developers will encounter in practice, pushing AI model builders to create systems that actually generalize across the languages used in professional software development.

Read on arXiv Posted on arXiv · Jun 18, 2026