Humanoid Loco-Manipulation

Active Frontier
humanoidlocomotionmanipulationwhole-body

Humanoid Loco-Manipulation

Loco-manipulation is the unified challenge of coordinating locomotion (walking, balancing, navigating) and manipulation (grasping, carrying, placing) simultaneously on a humanoid robot. Unlike industrial arms bolted to a floor, humanoids must maintain dynamic balance while their arms exert forces on the environment — making this one of the hardest open problems in robotics.

Gu et al. provide a comprehensive survey spanning three decades of model-based approaches, from early ZMP (Zero Moment Point) controllers through modern learning-based methods. The survey documents the progression from separate locomotion and manipulation pipelines toward integrated whole-body frameworks.

Wen et al. take a fundamentally different approach with Humanoid-COA (Chain of Action), using foundation models to achieve zero-shot loco-manipulation without task-specific training. Their framework decomposes natural language instructions into executable whole-body behaviors, achieving 96.6% grasping accuracy and 90% mobile pick success on Unitree H1-2 and G1 platforms.

Key Claims

  • Three decades of model-based approaches surveyed — Gu et al. comprehensively cover the evolution from ZMP controllers to modern learning-based loco-manipulation, establishing the landscape of model fidelity vs. computational efficiency trade-offs. Evidence: strong (Humanoid Locomotion & Manipulation Survey)
  • 96.6% grasping and 90% mobile pick achieved zero-shot — Wen et al.'s CoA framework decomposes high-level language instructions into executable whole-body behaviors without any task-specific training data. Evidence: strong (Humanoid-COA: Chain of Action)
  • Long-horizon combined tasks remain at 56-63% success — While individual subtasks show high accuracy, chaining locomotion and manipulation over extended task sequences remains significantly harder. Evidence: strong (Humanoid-COA: Chain of Action)

Open Questions

  • Can long-horizon combined tasks be improved beyond 56-63% success without sacrificing zero-shot generalization?
  • How do model-based and learning-based approaches compare on robustness across diverse real-world environments?
  • What role does tactile feedback play in closing the gap for contact-rich manipulation during locomotion?
  • How to balance computational efficiency with model fidelity for real-time whole-body control?

Related Concepts

Related Entities

  • Unitree — H1-2 and G1 platforms used for CoA experiments
  • Figure AI — Pursuing loco-manipulation via imitation learning

Backlinks

Pages that reference this concept:

Humanoid Loco-Manipulation | KB | MenFem