Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation
Two-stage training (800h human video + 30h robot data) outperforms baselines with 10× more data
Ψ₀: Universal Humanoid Loco-Manipulation Foundation Model
Key Claims
- Two-stage training recipe — autoregressive VLM pre-training on large-scale egocentric human videos → flow-based action expert post-training on humanoid robot data
- Data efficiency — 800 hours of human video + 30 hours of real robot data outperforms baselines pre-trained on more than 10× as much data
- Open foundation model — released with weights and recipe, intended for community extension
- Loco-manipulation focus — tasks combining locomotion and manipulation (e.g., walking to a counter and picking up an object), the hardest regime for humanoids
Why This Matters
Ψ₀ is the humanoid analog of V-JEPA 2-AC — same philosophy (passive video pre-training plus small interaction dataset), applied to a humanoid form factor rather than a Franka arm. The 10× data efficiency claim, if it holds at scale, has major cost implications for humanoid developers: if 800 hours of human video replaces thousands of hours of robot teleoperation, the talent and capex requirements for a humanoid foundation model drop significantly.
Direct connection to the AI KB's world-models research: the same architectural insight (abstract/structured representation pre-training + small action-conditioned adapter) is now validated across at least two robotic form factors.
Notes
First-pass stub. Flag for deeper read when producing humanoid market-structure coverage.
Source: Ψ₀