Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Computer Science · AI Jul 3, 2026

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Teaching robots to move first, then learn what tasks mean

Junhao Shi, Siyin Wang, Xiaopeng Yu et al.
arXiv:2607.02466

Summary

Researchers separated robot learning into two parts: first, learning basic movement skills from cheap unlabeled footage, and second, connecting those skills to language instructions using small amounts of expert data. This approach matched the performance of models trained on over 1 million labeled examples while using far less expensive supervision, and performed 25 times better than competing methods when camera angles shifted unexpectedly.

Why it matters

Collecting labeled robot training data is expensive and slow — a major barrier to deploying AI robots at scale. By showing that robots can learn useful movement patterns from cheap, unlabeled video first, this work dramatically reduces the amount of expert supervision needed to teach them new tasks. Real robots trained this way also stayed functional when their cameras were moved or tilted, a robustness gain that could make deployed systems practical rather than brittle.

Read on arXiv Posted on arXiv · Jul 2, 2026