Realistic traffic simulation is crucial for developing self-driving software in a safe and scalable manner prior to real-world deployment. Typically, imitation learning (IL) is used to learn human-like traffic agents directly from real-world observations collected offline, but without explicit specification of traffic rules, agents trained from IL alone frequently display unrealistic infractions like collisions and driving off the road. This problem is exacerbated in out-of-distribution and long-tail scenarios. On the other hand, reinforcement learning (RL) can train traffic agents to avoid infractions, but using RL alone results in unhuman-like driving behaviors. We propose Reinforcing Traffic Rules (RTR), a holistic closed-loop learning objective to match expert demonstrations under a traffic compliance constraint, which naturally gives rise to a joint IL + RL approach, obtaining the best of both worlds. Our method learns in closed-loop simulations of both nominal scenarios from real-world datasets as well as procedurally generated long-tail scenarios. Our experiments show that RTR learns more realistic and generalizable traffic simulation policies, achieving significantly better tradeoffs between human-like driving and traffic compliance in both nominal and long-tail scenarios. Moreover, when used as a data generation tool for training prediction models, our learned traffic policy leads to considerably improved downstream prediction metrics compared to baseline traffic agents. For more information, visit the project website: https://waabi.ai/rtr
翻译:现实的交通仿真对于在真实世界部署前安全、可扩展地开发自动驾驶软件至关重要。通常,模仿学习直接根据离线收集的真实世界观测来学习类人交通智能体,但若未显式指定交通规则,仅通过模仿学习训练的智能体常表现出不现实的违规行为,如碰撞和驶离道路。这一问题在分布外和长尾场景中更为严重。另一方面,强化学习可训练交通智能体避免违规,但仅使用强化学习会产生非类人驾驶行为。我们提出交通规则强化方法,这是一种在交通合规约束下匹配专家示范的整体闭环学习目标,自然引出了联合模仿学习+强化学习的方法,兼具两者优势。我们的方法在真实世界数据集的标称场景及程序生成的长尾场景的闭环仿真中学习。实验表明,交通规则强化学习了更现实且泛化能力更强的交通仿真策略,在标称和长尾场景中均实现了类人驾驶与交通合规之间显著更优的平衡。此外,当作为训练预测模型的数据生成工具时,与基线交通智能体相比,我们学习的交通策略显著提升了下游预测指标。更多信息请访问项目网站:https://waabi.ai/rtr