Deep imitation learning is promising for solving dexterous manipulation tasks because it does not require an environment model and pre-programmed robot behavior. However, its application to dual-arm manipulation tasks remains challenging. In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions and results in poor performance of the neural networks. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world. The proposed method has been tested on dual-arm manipulation tasks using a real robot. The experimental results demonstrated that the Transformer-based deep imitation learning architecture can attend to the important features among the sensory inputs, therefore reducing distractions and improving manipulation performance when compared with the baseline architecture without the self-attention mechanisms.
翻译:深度模仿学习因无需环境模型及预编程机器人行为,在解决灵巧操作任务方面展现出潜力。然而,其在双臂操作任务中的应用仍面临挑战。在双臂操作场景中,额外机械臂引入的状态维度增加会引发干扰,导致神经网络性能下降。我们通过自注意力机制解决该问题,该机制可计算序列输入元素间的依赖关系并聚焦关键元素。作为自注意力架构的变体,Transformer被应用于深度模仿学习,以解决真实世界中的双臂操作任务。该方法已在真实机器人上完成双臂操作任务测试。实验结果表明:相比无自注意力机制的基线架构,基于Transformer的深度模仿学习架构能有效关注传感输入中的重要特征,从而减少干扰并提升操作性能。