PAPER1970-01-01·arXiv 2509.07445

Text2Touch: LLM-Designed Reward Functions for Tactile In-Hand Rotation

COMPILED NOTES

LLMs autonomously design reward functions for in-hand rotation policies on real tactile Allegro Hand with TacTip sensors

Text2Touch: LLM-Designed Reward Functions for Tactile In-Hand Rotation

Source: https://arxiv.org/abs/2509.07445
Type: paper
ArXiv ID: 2509.07445
Date Ingested: 2026-04-05T20:00:00Z
Tags: tactile-sensing, llm-reward-design, reinforcement-learning, dexterous-manipulation

Key Contribution

LLMs autonomously design reward functions for in-hand rotation policies on real tactile Allegro Hand with TacTip sensors. Demonstrates that language models can bridge the gap between high-level task descriptions and low-level tactile manipulation rewards.

Summary

Text2Touch introduces a framework where large language models (LLMs) automatically generate reward functions for training dexterous manipulation policies. Instead of hand-engineering reward signals — traditionally one of the hardest parts of RL for manipulation — the system uses natural language task descriptions to prompt an LLM to design appropriate reward functions.

Technical Pipeline

Task specification: Natural language description of desired manipulation behavior (e.g., "rotate the object 90 degrees clockwise")
LLM reward design: LLM generates a reward function incorporating tactile sensor readings, joint states, and object pose
Policy training: RL agent trained in simulation using the LLM-designed reward
Sim-to-real transfer: Trained policy deployed on physical Allegro Hand with TacTip sensors

Hardware

Allegro Hand: 4-finger dexterous hand (16 DOF)
TacTip sensors: Biomimetic optical tactile sensors on fingertips providing contact geometry through internal pin deformation tracking
Real-world deployment: Policies trained with LLM rewards successfully transfer to physical hardware

Key Results

LLM-designed rewards achieve comparable or superior performance to manually designed rewards
Tactile information is critical — LLM naturally incorporates tactile features into reward design
Zero-shot reward generation: no iterative tuning needed for new tasks
Successfully demonstrates real-world in-hand rotation with tactile feedback

Significance

This work represents a convergence of two major trends: (1) LLMs as automated engineers for robotics, and (2) tactile sensing for dexterous manipulation. By removing the reward engineering bottleneck, it dramatically lowers the barrier to training new manipulation skills. The fact that LLMs naturally leverage tactile signals in reward design suggests they have internalized useful priors about contact-rich manipulation.

RELATED · IN THE BASE