Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
翻译:模仿学习旨在通过专家演示使智能体习得特定任务。该领域的关键方法之一是在智能体与专家之间定义距离度量,并寻求最小化该距离的智能体策略。最优传输方法因能有效度量智能体与专家轨迹间的有意义距离而被广泛应用于模仿学习。然而,如何最优组合多个专家演示的问题尚未得到充分研究。现有标准方法直接串联状态(动作)轨迹,这在轨迹呈现多模态时存在缺陷。本文提出一种基于多边缘最优传输距离的替代方法,能够从最优传输意义上组合多样化的状态轨迹,从而为演示数据提供更具几何合理性的平均表示。该方法使智能体能够从多位专家处学习,并在OpenAI Gym控制环境中验证其有效性,结果表明标准串联方法并非始终最优。