PAPER2025-11-04·Multiple·arXiv 2511.02097

A Step Toward World Models: A Survey on Robotic Manipulation

Multiple authors

COMPILED NOTES

Surveys manipulation methods exhibiting world-model capabilities — bridges VLA models and explicit world models

A Step Toward World Models: A Survey on Robotic Manipulation

Key Claims

Surveys approaches in robotic manipulation that exhibit the core capabilities of world models — specifically: look-ahead prediction, counterfactual rollout, and goal-directed planning.

Positioning

This survey complements VLM-VLA Robotic Manipulation Survey already in the KB. Where the VLA survey focuses on policies that map vision+language to actions, this one focuses on the internal model — whether and how manipulation systems build latent simulators.

Scope

Explicit world-model approaches (e.g., latent dynamics models, RL in imagination)
Implicit world-model capabilities in VLA systems
Sim-to-real via learned simulators
Task and motion planning guided by learned dynamics

Why This Matters

Robotic manipulation is the cleanest empirical test of whether a world model is "real." A pretty video generation demo doesn't touch physics; a robot that can plan a multi-step manipulation using a latent model has demonstrated something operational. V-JEPA 2-AC + H-WM + StructVLA are the anchor papers for this measurement.

Notes

First-pass stub. Deeper ingestion deferred to next pass when cross-linking with the robotics KB topic.

Source: A Step Toward World Models: A Survey on Robotic Manipulation

RELATED · IN THE BASE