Imitation learning has been applied to a range of robotic tasks, but can struggle when (1) robots encounter edge cases that are not represented in the training data (distribution shift) or (2) the human demonstrations are heterogeneous: taking different paths around an obstacle, for instance (multimodality). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human teleoperators during task execution and learn from them over time, but is not equipped to handle multimodality. Recent work proposes Implicit Behavior Cloning (IBC), which is able to represent multimodal demonstrations using energy-based models (EBMs). In this work, we propose addressing both multimodality and distribution shift with Implicit Interactive Fleet Learning (IIFL), the first extension of implicit policies to interactive imitation learning (including the single-robot, single-human setting). IIFL quantifies uncertainty using a novel application of Jeffreys divergence to EBMs. While IIFL is more computationally expensive than explicit methods, results suggest that IIFL achieves 4.5x higher return on human effort in simulation experiments and an 80% higher success rate in a physical block pushing task over (Explicit) IFL, IBC, and other baselines when human supervision is heterogeneous.
翻译:模仿学习已应用于多种机器人任务,但在以下情况下可能面临挑战:(1)机器人遇到训练数据中未包含的边界情况(分布偏移),或(2)人类演示存在异构性,例如绕过障碍物时采取不同路径(多模态性)。交互式车队学习(IFL)通过允许机器人在执行任务过程中远程接入人类操作员并随时间向人类学习,缓解了分布偏移问题,但无法处理多模态性。近期研究提出的隐式行为克隆(IBC)能够利用能量基模型(EBMs)表示多模态演示。本文提出隐式交互式车队学习(IIFL)以同时解决多模态性和分布偏移问题,这是隐式策略首次扩展至交互式模仿学习(包括单机器人-单人类场景)。IIFL通过将Jeffreys散度创新性地应用于EBMs来量化不确定性。尽管IIFL比显式方法的计算成本更高,但实验结果表明:当人类监督具有异构性时,IIFL在仿真实验中实现了4.5倍的人类努力回报率,在物理积木推送任务中相较于(显式)IFL、IBC及其他基线方法取得了80%更高的成功率。