Humanoid Agent via Embodied Chain-of-Action Reasoning for Zero-Shot Loco-Manipulation
PaperCongcong Wen et al.NYU, Harvard, UCL, University of LiverpoolApril 13, 2025
Original SourceKey Contribution
First zero-shot loco-manipulation framework using foundation models on physical humanoids
Humanoid Agent via Embodied Chain-of-Action Reasoning
Abstract
Introduces Humanoid-COA, a framework for humanoid loco-manipulation integrating whole-body movement with object manipulation from natural language instructions. Uses an Embodied Chain-of-Action (CoA) mechanism that decomposes high-level human instructions into structured sequences of locomotion and manipulation primitives through affordance analysis, spatial reasoning, and whole-body action planning.
Key Contributions
- First humanoid agent framework integrating foundation model reasoning for zero-shot loco-manipulation under natural language instructions
- Novel Embodied CoA mechanism decomposing high-level intent into executable whole-body behaviors for long-horizon tasks
- Real-world validation demonstrating robust zero-shot generalization across diverse loco-manipulation tasks on two physical platforms
Methodology
Perception-reasoning-action paradigm:
- Perception: GPT-4V converts RGB-D observations into scene descriptions
- Reasoning: CoA integrates object affordance analysis, region spatial reasoning, whole-body movement inference
- Execution: Grounds symbolic plans into motor commands via pre-trained controllers
- Tested on Unitree H1-2 and G1 humanoid robots
Results
- Manipulation: 96.6% grasping, 93.3% relocation, 73.3% rearrangement
- Locomotion: 96.6% target approach, 63.3% navigation under occlusion
- Loco-Manipulation: 90.0% mobile pick, 96.6% mobile place, 63.3% long-horizon combined
- Without all three CoA components: only 50% executability
Limitations
- Complex rearrangement tasks show lower reliability (73.3%)
- Long-horizon combined tasks remain challenging (56-63%)
- Dependence on pre-trained foundation models (GPT-4, GPT-4V)
- No adaptation mechanism for failure recovery
Source: Humanoid-COA by Wen et al., NYU/Harvard/UCL
Tags
humanoidfoundation-modelsloco-manipulationzero-shotunitree