EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

Computer Science · AI May 16, 2026

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

Testing AI's ability to keep characters consistent across long video sequences

Ruozhen He, Meng Wei, Ziyan Yang et al.
arXiv:2605.15199

Summary

Researchers built EntityBench, a standardized test for video-generation AI that measures whether systems can keep the same characters, objects, and locations consistent across long sequences of shots. The test, based on real TV episodes, reveals that existing systems struggle dramatically when characters reappear after long gaps, and a new memory-based approach (EntityMem) achieved significantly better character consistency than existing methods.

Why it matters

Generating coherent multi-scene videos is a step toward AI that can create longer, more complex visual stories — from TV-like narratives to advertisements and filmmaking. Right now, when a character disappears from frame for several minutes then reappears, AI systems often render them looking completely different, breaking the viewer's experience. EntityBench gives researchers a concrete way to measure and improve this problem, accelerating progress toward AI that can maintain visual continuity over extended sequences.

Read on arXiv Posted on arXiv · May 14, 2026