A Survey of World Models for Autonomous Driving
Three-tiered taxonomy: future-world generation, behavior planning, integrated closed-loop systems
A Survey of World Models for Autonomous Driving
Key Claims (from abstract/summary)
Three-tiered taxonomy for driving-focused world models:
Tier 1 — Generation of Future Physical World:
- Image-based generation
- BEV (bird's-eye-view) based generation
- OG (occupancy grid) based generation
- PC (point cloud) based generation
- Methods enhance scene evolution via diffusion models and 4D occupancy forecasting
Tier 2 — Behavior Planning for Intelligent Agents:
- Rule-driven paradigms
- Learning-based paradigms
- Cost-map optimization
- Reinforcement learning for trajectory generation in complex traffic
Tier 3 — Integrated closed-loop systems combining both.
Key Players Referenced
The driving world-model space includes Wayve (GAIA series), NVIDIA (Cosmos), Tesla's internal world models, MILE/DriveDreamer academic systems — this survey organizes the lineage.
Why This Matters
Autonomous driving is the highest-stakes commercial deployment of world models today. Every major AV company is running some form of world model internally — they're used for sim-to-real, counterfactual evaluation, and data augmentation. The driving vertical will force answers to the pixel-vs-latent debate before the robotics vertical does, because the training budgets are larger.
Notes
Full paper not yet deeply ingested — first-pass stub from search results. Flag for deeper read when we start producing commercial-layer coverage for the autonomous driving market structure.