T3never reviewed
Foundation models (VLMs) will replace task-specific training as the dominant control paradigm within 3 years
Conviction
6.0/10
Trajectory
no history yetLast reviewed
—
Humanoid-COA's 96.6% grasping and 90% mobile pick via GPT-4V — with zero task-specific training — is the strongest evidence that the future of robot control is general-purpose reasoning, not per-task RL policies.
Confidence: 7/10 Supporting evidence:
- 96.6% zero-shot grasping on physical Unitree robots via foundation models Evidence: strong (Humanoid-COA)
- LLMs autonomously designing reward functions for tactile manipulation (Text2Touch) Evidence: strong (Text2Touch)
- Pattern: VLM reasoning -> task decomposition -> pre-trained execution becoming standard Evidence: strong (Frontier)
- Teleoperation data + simulation scaling works for bootstrapping Evidence: moderate (1X NEO, Figure)
Challenging evidence:
- Long-horizon combined tasks still 56-63% success — gap is significant for production use
- Dependence on external APIs introduces latency and availability risks
- Contact-rich manipulation (assembly, cooking) may still require specialized policies
- LLM-designed reward functions are untested for fine force control tasks
Evolution:
- Apr 5, 2026 — Initial thesis at 7/10. The 96.6% grasping result is remarkable for zero-shot. The remaining gap is in complex, multi-step, contact-rich tasks — but the trajectory is clear. "Dominant paradigm" means >50% of new deployments use foundation models as the primary reasoning layer.
Depends on: foundation-models-for-robotics, humanoid-loco-manipulation, tactile-sensing Would change if: Long-horizon task success rates stagnate below 80%, or if a non-VLM approach (e.g., pure RL + world models) achieves superior performance.