KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Computer Science · AI May 13, 2026

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

How to run language models on massive texts without retraining them

Alireza Nadali, Patrick Cooper, Ashutosh Trivedi et al.
arXiv:2605.12471

Summary

Researchers showed that language models can process extremely long documents by treating their internal memory like a repeating chain—each chunk of text updates the previous one without needing any retraining. The method works perfectly on retrieval tasks across documents up to 128,000 tokens long (roughly 100,000 words) on standard hardware, maintaining accuracy even through over 500 processing steps.

Why it matters

Current language models break down on very long documents because they run out of memory. KV-Fold solves this without requiring expensive retraining or architectural redesigns—it works immediately on existing models. This makes it practical to search through massive documents, analyze long books, or process extended conversations on ordinary GPUs, expanding what these models can handle without slowing them down or requiring specialist infrastructure.

Read on arXiv Posted on arXiv · May 12, 2026