SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Computer Science · AI Jun 12, 2026

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Building better text search for Slovak without relying on expensive English-focused tools

Marek Šuppa, Andrej Ridzik, Daniel Hládek et al.
arXiv:2606.13647

Summary

Researchers created the first large-scale benchmark for testing text-search systems in Slovak, a language with limited AI resources, and found that existing Slovak language models don't work well for this task. They then built two smaller, faster Slovak models that match the performance of expensive commercial systems but can run on local computers without internet access.

Why it matters

Slovak speakers and businesses can now search documents and build AI systems that understand their language without paying for external APIs or waiting for cloud responses. This approach also shows smaller languages how to catch up: the team released everything publicly so other under-resourced languages can follow the same playbook.

Read on arXiv Posted on arXiv · Jun 11, 2026