Self-Augmenting Retrieval for Diffusion Language Models

Computer Science · AI Jun 5, 2026

Self-Augmenting Retrieval for Diffusion Language Models

Using a language model's uncertain guesses to find better information faster

Paul Jünger, Justin Lovelace, Linxi Zhao et al.
arXiv:2606.06474

Summary

Discrete diffusion language models generate text by repeatedly refining all words at once, discarding low-confidence predictions at each step. Researchers discovered these rejected words actually contain valuable clues about what information the model will need, and built a system called SARDI that uses these clues to retrieve relevant facts during generation. On five question-answering benchmarks, SARDI outperformed existing methods while running up to 8 times faster.

Why it matters

Retrieval-augmented systems currently have to choose what to look up before finalizing answers, often missing crucial facts or wasting computation on irrelevant searches. SARDI solves this by peeking at the model's working process to retrieve information more intelligently—delivering more accurate answers in the same time, or the same answers much faster. This matters for applications like research assistants or chatbots that need both speed and accuracy.

Read on arXiv Posted on arXiv · Jun 4, 2026