In dynamic environments, robots often encounter constrained movement trajectories when manipulating objects with specific properties, such as doors. Therefore, applying the appropriate force is crucial to prevent damage to both the robots and the objects. However, current vision-guided robot state generation methods often falter in this regard, as they lack the integration of tactile perception. To tackle this issue, this paper introduces a novel state diffusion framework termed SafeDiff. It generates a prospective state sequence from the current robot state and visual context observation while incorporating real-time tactile feedback to refine the sequence. As far as we know, this is the first study specifically focused on ensuring force safety in robotic manipulation. It significantly enhances the rationality of state planning, and the safe action trajectory is derived from inverse dynamics based on this refined planning. In practice, unlike previous approaches that concatenate visual and tactile data to generate future robot state sequences, our method employs tactile data as a calibration signal to adjust the robot's state within the state space implicitly. Additionally, we've developed a large-scale simulation dataset called SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening, across both simulated and real-world settings.
翻译:在动态环境中,机器人操作具有特定属性的物体(如门)时,常面临受限的运动轨迹。因此,施加恰当的力对于防止机器人与物体受损至关重要。然而,当前基于视觉的机器人状态生成方法往往在此方面存在不足,因其缺乏触觉感知的融合。为解决此问题,本文提出一种新颖的状态扩散框架SafeDiff。该框架从当前机器人状态及视觉上下文观测中生成前瞻性状态序列,同时融合实时触觉反馈以优化该序列。据我们所知,这是首个专门针对机器人操作中力安全保证的研究。该方法显著提升了状态规划的合理性,并基于优化后的规划通过逆动力学推导出安全的动作轨迹。实践中,与以往将视觉和触觉数据简单拼接以生成未来机器人状态序列的方法不同,本方法将触觉数据作为标定信号,隐式地在状态空间内调整机器人状态。此外,我们构建了大规模仿真数据集SafeDoorManip50k,提供丰富的多模态数据以训练和评估所提方法。大量实验表明,我们的视觉-触觉模型在仿真和真实场景中均能显著降低开门过程中的有害力风险。