Imitation learning aims to mimic the behavior of experts without explicit reward signals. Passive imitation learning methods which use static expert datasets typically suffer from compounding error, low sample efficiency, and high hyper-parameter sensitivity. In contrast, active imitation learning methods solicit expert interventions to address the limitations. However, recent active imitation learning methods are designed based on human intuitions or empirical experience without theoretical guarantee. In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process. By solving the optimization objective of this framework, we propose a practical implementation, naming it AdapMen. Theoretical analysis shows that AdapMen can improve the error bound and avoid compounding error under mild conditions. Experiments on the MetaDrive benchmark and Atari 2600 games validate our theoretical analysis and show that our method achieves near-expert performance with much less expert involvement and total sampling steps than previous methods. The code is available at https://github.com/liuxhym/AdapMen.
翻译:模仿学习旨在模仿专家的行为,而无需明确的奖励信号。使用静态专家数据集的被动模仿学习方法通常会出现复合误差、样本效率低以及超参数敏感度高等问题。相比之下,主动模仿学习方法通过请求专家干预来应对这些局限。然而,近年来的主动模仿学习方法基于人类直觉或经验设计,缺乏理论保证。本文提出了一种基于师生交互模型的新型主动模仿学习框架,其中教师的目标是识别最优教学行为,并主动影响学生的学习过程。通过求解该框架的优化目标,我们提出了一种实用实现方法,命名为AdapMen。理论分析表明,在温和条件下,AdapMen能够改善误差边界并避免复合误差。在MetaDrive基准测试和Atari 2600游戏上的实验结果验证了我们的理论分析,并表明相较于以往方法,我们的方法能以更少的专家参与和总采样步骤达到接近专家的性能。代码已开源:https://github.com/liuxhym/AdapMen。