3D and 4D World Modeling: A Survey
COMPILED NOTES
Hierarchical taxonomy (VideoGen / OccGen / LiDARGen) for 3D/4D world models
3D and 4D World Modeling: A Survey
Key Claims
Hierarchical taxonomy of 3D and 4D (3D + time) world models:
- VideoGen — video-centric generative approaches (Sora/Genie lineage)
- OccGen — occupancy-centric generation (4D occupancy forecasting used heavily in AV)
- LiDARGen — point-cloud/LiDAR-based generation for robotics and AV
Why This Matters
Clarifies a frequent confusion: "world model" in papers can mean very different things depending on the representation. A Sora-style video model, a 4D occupancy forecaster, and a LiDAR simulator are all called "world models" but operate on fundamentally different data structures with different downstream uses.
Useful companion to the Tsinghua and Embodied AI surveys because it slices on representation rather than function or application.
Notes
First-pass stub. Deeper ingestion when producing 3D/spatial-computing coverage.
RELATED · IN THE BASE