Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
翻译:学习多样化的技能是机器人领域的主要挑战之一。为此,模仿学习方法已取得显著成果。但这些方法需要显式标注的数据集,或假设技能执行具有一致性,方可实现个体行为的学习与主动控制,这限制了其适用性。本文提出一种协作对抗方法,通过最大化无标签数据集中不同状态转移模式的判别性,从包含多样化状态转移模式的未标注数据中获取具备可控技能集的单一通用策略。此外,我们证明在生成对抗模仿学习框架中,利用无监督技能发现能够涌现出兼具任务完成能力的新颖实用技能。最后,我们在名为Solo 8的敏捷四足机器人上测试所获得的通用策略,验证了其对演示数据中编码的多样化技能的高度复现能力。