While imitation learning provides us with an efficient toolkit to train robots, learning skills that are robust to environment variations remains a significant challenge. Current approaches address this challenge by relying either on large amounts of demonstrations that span environment variations or on handcrafted reward functions that require state estimates. Both directions are not scalable to fast imitation. In this work, we present Fast Imitation of Skills from Humans (FISH), a new imitation learning approach that can learn robust visual skills with less than a minute of human demonstrations. Given a weak base-policy trained by offline imitation of demonstrations, FISH computes rewards that correspond to the "match" between the robot's behavior and the demonstrations. These rewards are then used to adaptively update a residual policy that adds on to the base-policy. Across all tasks, FISH requires at most twenty minutes of interactive learning to imitate demonstrations on object configurations that were not seen in the demonstrations. Importantly, FISH is constructed to be versatile, which allows it to be used across robot morphologies (e.g. xArm, Allegro, Stretch) and camera configurations (e.g. third-person, eye-in-hand). Our experimental evaluations on 9 different tasks show that FISH achieves an average success rate of 93%, which is around 3.8x higher than prior state-of-the-art methods.
翻译:尽管模仿学习为我们提供了训练机器人的高效工具,但学习对环境变化具有鲁棒性的技能仍是一项重大挑战。当前方法通过依赖覆盖环境变化的大量演示或需要状态估计的手工设计奖励函数来应对这一挑战,这两种方向都难以扩展至快速模仿。本文提出"人类技能快速模仿"(Fast Imitation of Skills from Humans, FISH),一种全新的模仿学习方法,能在不到一分钟的人类演示中学习鲁棒的视觉技能。给定通过离线演示模仿训练的弱基策略,FISH计算与机器人行为及演示"匹配度"相对应的奖励,并利用这些奖励自适应更新叠加在基策略上的残差策略。在所有任务中,FISH最多需要二十分钟的交互学习来模仿演示中未出现的物体配置。重要的是,FISH被设计为通用方法,可适用于不同机器人形态(如xArm、Allegro、Stretch)和相机配置(如第三人称视角、眼在手内视角)。在9个不同任务上的实验评估表明,FISH实现了93%的平均成功率,比先前最先进方法高出约3.8倍。