PAPER PLAINE

Fresh research, simply explained. Updates twice daily.

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Teaching robots to move first, then learn what tasks mean

Researchers separated robot learning into two parts: first, learning basic movement skills from cheap unlabeled footage, and second, connecting those skills to language instructions using small amounts of expert data. This approach matched the performance of models trained on over 1 million labeled examples while using far less expensive supervision, and performed 25 times better than competing methods when camera angles shifted unexpectedly.

Collecting labeled robot training data is expensive and slow — a major barrier to deploying AI robots at scale. By showing that robots can learn useful movement patterns from cheap, unlabeled video first, this work dramatically reduces the amount of expert supervision needed to teach them new tasks. Real robots trained this way also stayed functional when their cameras were moved or tilted, a robustness gain that could make deployed systems practical rather than brittle.