Imitation learning enables autonomous agents to learn from human examples, without the need for a reward signal. Still, if the provided dataset does not encapsulate the task correctly, or when the task is too complex to be modeled, such agents fail to reproduce the expert policy. We propose to recover from these failures through online adaptation. Our approach combines the action proposal coming from a pre-trained policy with relevant experience recorded by an expert. The combination results in an adapted action that closely follows the expert. Our experiments show that an adapted agent performs better than its pure imitation learning counterpart. Notably, adapted agents can achieve reasonable performance even when the base, non-adapted policy catastrophically fails.
翻译:模仿学习使自主智能体能够从人类示例中学习,无需奖励信号。然而,如果提供的数据集未能正确封装任务,或任务过于复杂而难以建模,此类智能体将无法复现专家策略。我们提出通过在线适应从这些失败中恢复。我们的方法将预训练策略产生的动作提议与专家记录的相关经验相结合,生成能紧密跟随专家的适应后动作。实验表明,适应后的智能体表现优于纯模仿学习版本。值得注意的是,即使基础未适应策略发生灾难性失败,适应后的智能体仍能实现合理性能。