Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their actions. However, we think bimanual manipulation involves not only coordinated tasks but also various uncoordinated tasks that do not require explicit cooperation during execution, such as grasping objects with the closest hand, which integrated control frameworks ignore to consider due to their enforced cooperation in the early inputs. In this paper, we propose a novel decoupled interaction framework that considers the characteristics of different tasks in bimanual manipulation. The key insight of our framework is to assign an independent model to each arm to enhance the learning of uncoordinated tasks, while introducing a selective interaction module that adaptively learns weights from its own arm to improve the learning of coordinated tasks. Extensive experiments on seven tasks in the RoboTwin dataset demonstrate that: (1) Our framework achieves outstanding performance, with a 23.5% boost over the SOTA method. (2) Our framework is flexible and can be seamlessly integrated into existing methods. (3) Our framework can be effectively extended to multi-agent manipulation tasks, achieving a 28% boost over the integrated control SOTA. (4) The performance boost stems from the decoupled design itself, surpassing the SOTA by 16.5% in success rate with only 1/6 of the model size.
翻译:双手机器人操作是机器人学领域一个新兴且关键的研究方向。先前的研究主要依赖于集成控制模型,这些模型将双臂的感知与状态作为输入,直接预测其动作。然而,我们认为双手机器人操作不仅包含协调性任务,还涉及多种非协调性任务,这些任务在执行过程中无需显式协作,例如由距离最近的手抓取物体。集成控制框架因其在早期输入中强制要求协作,而忽视了对此类任务的考量。本文提出一种新颖的解耦交互框架,该框架考虑了双手机器人操作中不同任务的特点。我们框架的核心思想是为每只手臂分配一个独立模型,以增强对非协调性任务的学习,同时引入一个选择性交互模块,该模块自适应地从其所属手臂学习权重,以改善对协调性任务的学习。在RoboTwin数据集七个任务上进行的大量实验表明:(1) 我们的框架取得了卓越的性能,相较于最先进方法提升了23.5%。(2) 我们的框架具有灵活性,可无缝集成到现有方法中。(3) 我们的框架能有效扩展到多智能体操作任务,相比集成控制的最先进方法提升了28%。(4) 性能提升源于解耦设计本身,在模型尺寸仅为1/6的情况下,成功率仍超越最先进方法16.5%。