In this paper, we explore the impact of adding tactile sensation to video prediction models for physical robot interactions. Predicting the impact of robotic actions on the environment is a fundamental challenge in robotics. Current methods leverage visual and robot action data to generate video predictions over a given time period, which can then be used to adjust robot actions. However, humans rely on both visual and tactile feedback to develop and maintain a mental model of their physical surroundings. In this paper, we investigate the impact of integrating tactile feedback into video prediction models for physical robot interactions. We propose three multi-modal integration approaches and compare the performance of these tactile-enhanced video prediction models. Additionally, we introduce two new datasets of robot pushing that use a magnetic-based tactile sensor for unsupervised learning. The first dataset contains visually identical objects with different physical properties, while the second dataset mimics existing robot-pushing datasets of household object clusters. Our results demonstrate that incorporating tactile feedback into video prediction models improves scene prediction accuracy and enhances the agent's perception of physical interactions and understanding of cause-effect relationships during physical robot interactions.
翻译:本文探究了在物理机器人交互过程中,将触觉感知引入视频预测模型所带来的影响。预测机器人动作对环境的影响是机器人学中的一项基本挑战。现有方法利用视觉与机器人动作数据,在给定时间段内生成视频预测,进而可用于调整机器人动作。然而,人类依赖视觉和触觉反馈来建立并维持对物理环境的心理模型。本文研究了将触觉反馈整合到物理机器人交互视频预测模型中的影响。我们提出了三种多模态集成方法,并比较了这些触觉增强型视频预测模型的性能。此外,我们还引入了两个基于磁触觉传感器的机器人推物体数据集,用于无监督学习。第一个数据集包含视觉相同但物理属性不同的物体,第二个数据集则模拟了现有针对家用物体簇的机器人推物体数据集。我们的结果表明,将触觉反馈融入视频预测模型可提升场景预测精度,并增强智能体在物理机器人交互过程中对物理交互的感知以及对因果关系理解的把握。