Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
翻译:精细且接触密集的操作对机器人而言仍具挑战性,主要原因在于触觉反馈未能得到充分利用。为解决这一问题,我们提出了TouchGuide,一种新颖的跨策略视觉-触觉融合范式,该范式在低维动作空间内融合多模态信息。具体而言,TouchGuide在推理时通过两个阶段来引导一个预训练的扩散或流匹配视觉运动策略。首先,在早期采样阶段,策略仅使用视觉输入产生一个粗略的、视觉上合理的动作。其次,一个任务特定的接触物理模型提供触觉引导,以调控和细化该动作,确保其符合真实的物理接触条件。CPM通过在有限的专家演示上进行对比学习训练,提供一个触觉感知的可行性评分,从而引导采样过程朝向满足物理接触约束的精细化动作。此外,为促进使用高质量且成本效益高的数据进行TouchGuide训练,我们引入了TacUMI数据采集系统。TacUMI在精度与成本效益之间实现了良好的平衡;通过利用刚性指尖获取直接触觉反馈,从而能够收集可靠的触觉数据。在五个具有挑战性的接触密集任务(如系鞋带和芯片交接)上进行的大量实验表明,TouchGuide始终显著优于最先进的视觉-触觉策略。