Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks. However, previous IL methods either only use expensive expert demonstrations and omit imperfect demonstrations or rely on interacting with the environment and learning from online experiences. In the context of robotic manipulation, we aim to conquer the above two challenges and propose a novel framework named Similarity Weighted Behavior Transformer (SWBT). SWBT effectively learn from both expert and imperfect demonstrations without interaction with environments. We reveal that the easy-to-get imperfect demonstrations, such as forward and inverse dynamics, significantly enhance the network by learning fruitful information. To the best of our knowledge, we are the first to attempt to integrate imperfect demonstrations into the offline imitation learning setting for robot manipulation tasks. Extensive experiments on the ManiSkill2 benchmark built on the high-fidelity Sapien simulator and real-world robotic manipulation tasks demonstrated that the proposed method can extract better features and improve the success rates for all tasks. Our code will be released upon acceptance of the paper.
翻译:模仿学习旨在从专家演示中学习最优控制策略,是机器人操作任务的有效方法。然而,以往的模仿学习方法要么仅使用昂贵的专家演示而忽略非完美演示,要么依赖于与环境交互并从在线经验中学习。在机器人操作背景下,我们旨在克服上述两个挑战,提出一种名为相似度加权行为变换器(SWBT)的新框架。SWBT无需与环境交互即可有效学习专家演示和非完美演示。我们揭示了易于获取的非完美演示(如前向和逆向动力学)通过学习丰富信息能显著增强网络性能。据我们所知,这是首次尝试将非完美演示融入离线模仿学习框架以解决机器人操作任务。在基于高保真Sapien仿真器的ManiSkill2基准测试和真实机器人操作任务上的大量实验表明,所提方法能提取更优特征并提升所有任务的成功率。相关代码将在论文接收后开源。