Imitation learning has been applied to a range of robotic tasks, but can struggle when robots encounter edge cases that are not represented in the training data (i.e., distribution shift). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human supervisors during task execution and learn from them over time, but different supervisors may demonstrate the task in different ways. Recent work proposes Implicit Behavior Cloning (IBC), which is able to represent multimodal demonstrations using energy-based models (EBMs). In this work, we propose Implicit Interactive Fleet Learning (IIFL), an algorithm that builds on IBC for interactive imitation learning from multiple heterogeneous human supervisors. A key insight in IIFL is a novel approach for uncertainty quantification in EBMs using Jeffreys divergence. While IIFL is more computationally expensive than explicit methods, results suggest that IIFL achieves a 2.8x higher success rate in simulation experiments and a 4.5x higher return on human effort in a physical block pushing task over (Explicit) IFL, IBC, and other baselines.
翻译:摘要:模仿学习已应用于一系列机器人任务,但当机器人遇到训练数据中未涉及的边缘案例(即分布偏移)时,可能会遇到困难。交互式车队学习(IFL)通过允许机器人在任务执行过程中访问远程人类监督者并随时间向他们学习来缓解分布偏移,但不同监督者可能以不同方式演示任务。近期工作提出了隐式行为克隆(IBC),该方法能够使用能量基模型(EBM)表示多模态演示。在本工作中,我们提出了隐式交互式车队学习(IIFL),这是一种基于IBC的算法,用于从多个异构人类监督者进行交互式模仿学习。IIFL的一个关键创新在于:一种利用杰弗里斯散度对EBM进行不确定性量化的新方法。尽管IIFL的计算成本高于显式方法,但结果表明,与(显式)IFL、IBC及其他基线相比,IIFL在仿真实验中成功率提高了2.8倍,在物理方块推动任务中人力投入回报提高了4.5倍。