The frame-level leakage trap: rethinking evaluation protocols for intrinsic image decomposition, with source-separable uncertainty as a case study

Engineering May 9, 2026

The frame-level leakage trap: rethinking evaluation protocols for intrinsic image decomposition, with source-separable uncertainty as a case study

How similar test frames secretly inflate computer vision scores by 10 decibels

Jihwan Woo
arXiv:2605.06359

Summary

Researchers discovered that a common way of testing image-decomposition algorithms on the MPI Sintel dataset inflates performance scores by 1.6 to 2.0 decibels because spatially similar frames from the same scene leak into both training and test sets. Using the correct evaluation method—splitting by scene rather than by frame—reveals that past reported results were significantly overstated, and the team proposes a new model that estimates uncertainty separately for different image components, allowing it to identify and filter out unreliable pixels with 77% error reduction.

Why it matters

Accurate evaluation standards prevent researchers from chasing inflated performance numbers and wasting effort on algorithms that aren't actually better. The proposed uncertainty method also has practical value: by flagging which pixels it's unsure about, it enables downstream applications to discard unreliable regions and achieve much cleaner results—useful for any system relying on image decomposition in graphics, robotics, or computational photography.

Read on arXiv Posted on arXiv · May 7, 2026