Humanoid Agent via Embodied Chain-of-Action Reasoning for Zero-Shot Loco-Manipulation

Paper

Congcong Wen et al.NYU, Harvard, UCL, University of LiverpoolApril 13, 2025

Key Contribution

First zero-shot loco-manipulation framework using foundation models on physical humanoids

Humanoid Agent via Embodied Chain-of-Action Reasoning

Abstract

Introduces Humanoid-COA, a framework for humanoid loco-manipulation integrating whole-body movement with object manipulation from natural language instructions. Uses an Embodied Chain-of-Action (CoA) mechanism that decomposes high-level human instructions into structured sequences of locomotion and manipulation primitives through affordance analysis, spatial reasoning, and whole-body action planning.

Key Contributions

First humanoid agent framework integrating foundation model reasoning for zero-shot loco-manipulation under natural language instructions
Novel Embodied CoA mechanism decomposing high-level intent into executable whole-body behaviors for long-horizon tasks
Real-world validation demonstrating robust zero-shot generalization across diverse loco-manipulation tasks on two physical platforms

Methodology

Perception-reasoning-action paradigm:

Perception: GPT-4V converts RGB-D observations into scene descriptions
Reasoning: CoA integrates object affordance analysis, region spatial reasoning, whole-body movement inference
Execution: Grounds symbolic plans into motor commands via pre-trained controllers
Tested on Unitree H1-2 and G1 humanoid robots

Results

Manipulation: 96.6% grasping, 93.3% relocation, 73.3% rearrangement
Locomotion: 96.6% target approach, 63.3% navigation under occlusion
Loco-Manipulation: 90.0% mobile pick, 96.6% mobile place, 63.3% long-horizon combined
Without all three CoA components: only 50% executability

Limitations

Complex rearrangement tasks show lower reliability (73.3%)
Long-horizon combined tasks remain challenging (56-63%)
Dependence on pre-trained foundation models (GPT-4, GPT-4V)
No adaptation mechanism for failure recovery

Source: Humanoid-COA by Wen et al., NYU/Harvard/UCL

Identifiers

arXiv:2504.09532