PAPER2026-03-14·Multiple·arXiv 2603.12263

Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Multiple authors

COMPILED NOTES

Two-stage training (800h human video + 30h robot data) outperforms baselines with 10× more data

Ψ₀: Universal Humanoid Loco-Manipulation Foundation Model

Key Claims

Two-stage training recipe — autoregressive VLM pre-training on large-scale egocentric human videos → flow-based action expert post-training on humanoid robot data
Data efficiency — 800 hours of human video + 30 hours of real robot data outperforms baselines pre-trained on more than 10× as much data
Open foundation model — released with weights and recipe, intended for community extension
Loco-manipulation focus — tasks combining locomotion and manipulation (e.g., walking to a counter and picking up an object), the hardest regime for humanoids

Why This Matters

Ψ₀ is the humanoid analog of V-JEPA 2-AC — same philosophy (passive video pre-training plus small interaction dataset), applied to a humanoid form factor rather than a Franka arm. The 10× data efficiency claim, if it holds at scale, has major cost implications for humanoid developers: if 800 hours of human video replaces thousands of hours of robot teleoperation, the talent and capex requirements for a humanoid foundation model drop significantly.

Direct connection to the AI KB's world-models research: the same architectural insight (abstract/structured representation pre-training + small action-conditioned adapter) is now validated across at least two robotic form factors.

Notes

First-pass stub. Flag for deeper read when producing humanoid market-structure coverage.

Source: Ψ₀

RELATED · IN THE BASE