Imitation learning has been widely applied to various autonomous systems thanks to recent development in interactive algorithms that address covariate shift and compounding errors induced by traditional approaches like behavior cloning. However, existing interactive imitation learning methods assume access to one perfect expert. Whereas in reality, it is more likely to have multiple imperfect experts instead. In this paper, we propose MEGA-DAgger, a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. First, unsafe demonstrations are filtered while aggregating the training data, so the imperfect demonstrations have little influence when training the novice policy. Next, experts are evaluated and compared on scenarios-specific metrics to resolve the conflicted labels among experts. Through experiments in autonomous racing scenarios, we demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithm. The supplementary video can be found at https://youtu.be/pYQiPSHk6dU.
翻译:模仿学习已广泛应用于各类自主系统,这得益于近年来交互式算法的发展,这些算法解决了传统方法(如行为克隆)引发的协变量偏移和累积误差问题。然而,现有交互式模仿学习方法假设存在一个完美专家。而在现实中,更常见的情况是拥有多个不完美专家。本文提出MEGA-DAgger,一种适用于多个不完美专家交互式学习的DAgger变体。首先,在聚合训练数据时过滤不安全的演示,从而降低不完美演示对新手策略训练的影响;其次,通过场景特定指标对专家进行评估和比较,以解决专家间的标签冲突。通过自主赛车场景的实验,我们证明使用MEGA-DAgger学习的策略能够超越专家性能及采用最先进交互式模仿学习算法学习的策略。补充视频见https://youtu.be/pYQiPSHk6dU。