Developing autonomous vehicles that can navigate complex environments with human-level safety and efficiency is a central goal in self-driving research. A common approach to achieving this is imitation learning, where agents are trained to mimic human expert demonstrations collected from real-world driving scenarios. However, discrepancies between human perception and the self-driving car's sensors can introduce an \textit{imitation gap}, leading to imitation learning failures. In this work, we introduce \textbf{IGDrivSim}, a benchmark built on top of the Waymax simulator, designed to investigate the effects of the imitation gap in learning autonomous driving policy from human expert demonstrations. Our experiments show that this perception gap between human experts and self-driving agents can hinder the learning of safe and effective driving behaviors. We further show that combining imitation with reinforcement learning, using a simple penalty reward for prohibited behaviors, effectively mitigates these failures. Our code is open-sourced at: https://github.com/clemgris/IGDrivSim.git.
翻译:开发能够在复杂环境中以人类水平的安全性和效率进行导航的自动驾驶车辆,是自动驾驶研究的核心目标。实现这一目标的常用方法是模仿学习,即训练智能体模仿从真实世界驾驶场景中收集的人类专家示范数据。然而,人类感知与自动驾驶汽车传感器之间的差异可能引入一种\textit{模仿差距},从而导致模仿学习失败。在本工作中,我们提出了\textbf{IGDrivSim},这是一个构建在Waymax模拟器之上的基准测试,旨在研究从人类专家示范中学习自动驾驶策略时模仿差距的影响。我们的实验表明,人类专家与自动驾驶智能体之间的这种感知差距会阻碍安全有效驾驶行为的学习。我们进一步证明,将模仿学习与强化学习相结合,并对禁止行为施加简单的惩罚奖励,能够有效缓解这些失败。我们的代码已在以下地址开源:https://github.com/clemgris/IGDrivSim.git。