Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
翻译:使机器人能够有效模仿长时域任务(如运动、操作等)中专家技能,是一项长期存在的挑战。现有机器人模仿学习方法在复杂任务中仍存在性能次优的问题。本文探讨如何借助人类认知先验应对这一挑战。通过引入直观的人类认知先验,我们将常规的动作概念扩展为双层级结构——认知(高层级)与动作(低层级),并据此提出一种基于人机交互的新型技能模仿学习框架,称为"基于认知-动作的技能模仿学习"(CasIL),使机器人代理能够从原始视觉演示中有效认知并模仿关键技能。CasIL同时实现了认知模仿与动作模仿,高层级技能认知明确指导低层级原始动作,为整个技能模仿过程提供鲁棒性与可靠性。我们在MuJoCo和RLBench基准测试,以及四足机器人运动中的避障与点目标导航任务上评估了该方法。实验结果表明,在多种长时域机器人任务中,CasIL相比其他同类方法始终展现出具有竞争力的鲁棒技能模仿能力。