A Step Toward World Models: A Survey on Robotic Manipulation
Surveys manipulation methods exhibiting world-model capabilities — bridges VLA models and explicit world models
A Step Toward World Models: A Survey on Robotic Manipulation
Key Claims
Surveys approaches in robotic manipulation that exhibit the core capabilities of world models — specifically: look-ahead prediction, counterfactual rollout, and goal-directed planning.
Positioning
This survey complements VLM-VLA Robotic Manipulation Survey already in the KB. Where the VLA survey focuses on policies that map vision+language to actions, this one focuses on the internal model — whether and how manipulation systems build latent simulators.
Scope
- Explicit world-model approaches (e.g., latent dynamics models, RL in imagination)
- Implicit world-model capabilities in VLA systems
- Sim-to-real via learned simulators
- Task and motion planning guided by learned dynamics
Why This Matters
Robotic manipulation is the cleanest empirical test of whether a world model is "real." A pretty video generation demo doesn't touch physics; a robot that can plan a multi-step manipulation using a latent model has demonstrated something operational. V-JEPA 2-AC + H-WM + StructVLA are the anchor papers for this measurement.
Notes
First-pass stub. Deeper ingestion deferred to next pass when cross-linking with the robotics KB topic.
Source: A Step Toward World Models: A Survey on Robotic Manipulation