T3never reviewed

Foundation models (VLMs) will replace task-specific training as the dominant control paradigm within 3 years

Conviction

6.0/10

Trajectory

no history yet

Last reviewed

—

Humanoid-COA's 96.6% grasping and 90% mobile pick via GPT-4V — with zero task-specific training — is the strongest evidence that the future of robot control is general-purpose reasoning, not per-task RL policies.

Confidence: 7/10 Supporting evidence:

96.6% zero-shot grasping on physical Unitree robots via foundation models Evidence: strong (Humanoid-COA)
LLMs autonomously designing reward functions for tactile manipulation (Text2Touch) Evidence: strong (Text2Touch)
Pattern: VLM reasoning -> task decomposition -> pre-trained execution becoming standard Evidence: strong (Frontier)
Teleoperation data + simulation scaling works for bootstrapping Evidence: moderate (1X NEO, Figure)

Challenging evidence:

Long-horizon combined tasks still 56-63% success — gap is significant for production use
Dependence on external APIs introduces latency and availability risks
Contact-rich manipulation (assembly, cooking) may still require specialized policies
LLM-designed reward functions are untested for fine force control tasks

Evolution:

Apr 5, 2026 — Initial thesis at 7/10. The 96.6% grasping result is remarkable for zero-shot. The remaining gap is in complex, multi-step, contact-rich tasks — but the trajectory is clear. "Dominant paradigm" means >50% of new deployments use foundation models as the primary reasoning layer.

Depends on: foundation-models-for-robotics, humanoid-loco-manipulation, tactile-sensing Would change if: Long-horizon task success rates stagnate below 80%, or if a non-VLM approach (e.g., pure RL + world models) achieves superior performance.