Hierarchical Planning

Active Frontier

planningworld-modelsroboticsTAMP

Hierarchical Planning

Human planning is multi-level: a trip from New York to Tunis is not planned at the muscle-movement level. High-level ("go to airport") decisions compose into mid-level ("hail a taxi") then low-level ("stand up, walk to door") then subconscious (actual motor control).

For AI systems, hierarchical planning means the world model operates at multiple levels of abstraction — higher levels make long-term, coarse predictions; lower levels manage short-term detailed actions. The payoff: horizons that are computationally intractable at the low level become tractable when most of the search happens at the abstract level.

Key Claims

Hierarchical world models mitigate error accumulation — H-WM's symbolic-plus-visual architecture reduces drift in long-horizon TAMP problems by using the symbolic layer as a stabilizing prior. Evidence: moderate (H-WM)
Structured sparse prediction is a form of hierarchy — StructVLA predicts physically meaningful keyframes rather than every frame; this is a temporal hierarchy (skip unimportant frames) analogous to spatial hierarchies (symbolic-plus-visual). Evidence: moderate (StructVLA)
Most current world models are single-level — V-JEPA 2 and Genie 3 operate at a single abstraction level; hierarchical extension is an open research direction. Evidence: inferred from paper descriptions

Open Questions

How many levels of hierarchy are optimal? Two (symbolic + visual) is common; is more better?
Should the levels be hand-designed or learned?
Can JEPA representations at different layers serve as a natural hierarchy, or does hierarchy require explicit architectural separation?
How do you train a hierarchical world model end-to-end without the levels collapsing into each other?

Related Concepts

World Models — hierarchy is a structural property
System 2 Reasoning — hierarchical planning is a specific System 2 recipe
Joint Embedding Predictive Architecture (JEPA) — candidate substrate for hierarchical world models

Changelog

2026-04-22 — Initial compilation from H-WM and StructVLA.

Related Concepts

Theses that depend on this concept

These research positions cite this concept in their evidence. If the concept changes materially, these theses may need re-scoring.

T4Bnever reviewed

JEPA and generative world models will specialize to different use cases (control vs. simulation), not converge on a single architecture

6.0/10

no history yet

Sources

hierarchical-world-model-tamp structvla-beyond-dense-futures v-jepa-2

Hierarchical Planning

Active Frontier

planningworld-modelsroboticsTAMP

Hierarchical Planning

Key Claims

Hierarchical world models mitigate error accumulation — H-WM's symbolic-plus-visual architecture reduces drift in long-horizon TAMP problems by using the symbolic layer as a stabilizing prior. Evidence: moderate (H-WM)
Structured sparse prediction is a form of hierarchy — StructVLA predicts physically meaningful keyframes rather than every frame; this is a temporal hierarchy (skip unimportant frames) analogous to spatial hierarchies (symbolic-plus-visual). Evidence: moderate (StructVLA)
Most current world models are single-level — V-JEPA 2 and Genie 3 operate at a single abstraction level; hierarchical extension is an open research direction. Evidence: inferred from paper descriptions

Open Questions

How many levels of hierarchy are optimal? Two (symbolic + visual) is common; is more better?
Should the levels be hand-designed or learned?
Can JEPA representations at different layers serve as a natural hierarchy, or does hierarchy require explicit architectural separation?
How do you train a hierarchical world model end-to-end without the levels collapsing into each other?

Related Concepts

World Models — hierarchy is a structural property
System 2 Reasoning — hierarchical planning is a specific System 2 recipe
Joint Embedding Predictive Architecture (JEPA) — candidate substrate for hierarchical world models