Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .
翻译:执行富含接触的操作任务需要融合触觉与视觉反馈,然而这两种模态的差异性带来了显著挑战。本文提出一种利用视觉与触觉传感输入实现灵巧手内操作的系统。具体而言,我们提出"机器人联觉"(Robot Synesthesia)——受人类触觉-视觉联觉启发的新型点云触觉表征方法。该方法能够同时且无缝地整合两种传感输入,提供更丰富的空间信息,并促进对机器人动作的推理。该方案在仿真环境中训练后部署至实体机器人,适用于多种手内物体旋转任务。我们通过全面的消融实验探讨了视觉与触觉融合如何提升强化学习性能及Sim2Real迁移效果。项目页面详见https://yingyuan0414.github.io/visuotactile/。