Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

Statistics Jun 21, 2026

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

Evaluating AI decisions when reward data goes missing in unpredictable patterns

Ziheng Wei, Annie Qu, Rui Miao
arXiv:2606.20206

Summary

When hospitals or companies use past data to test new decision-making strategies, they often have incomplete records—some rewards are never recorded, others are hidden above a threshold. This creates a blind spot that breaks standard evaluation methods. The researchers developed a new statistical approach that recovers the missing information using future outcomes as clues, allowing them to fairly test new policies even when data is riddled with these gaps.

Why it matters

Healthcare systems and marketing platforms constantly evaluate whether new treatment or customer strategies would work better than current ones, but incomplete record-keeping undermines these tests. This method makes it possible to learn from flawed historical data without bias, meaning hospitals could confidently test new care protocols and companies could validate strategy changes using the messy real-world data they actually have.

Read on arXiv Posted on arXiv · Jun 18, 2026