Autonomous driving and its widespread adoption have long held tremendous promise. Nevertheless, without a trustworthy and thorough testing procedure, not only does the industry struggle to mass-produce autonomous vehicles (AV), but neither the general public nor policymakers are convinced to accept the innovations. Generating safety-critical scenarios that present significant challenges to AV is an essential first step in testing. Real-world datasets include naturalistic but overly safe driving behaviors, whereas simulation would allow for unrestricted exploration of diverse and aggressive traffic scenarios. Conversely, higher-dimensional searching space in simulation disables efficient scenario generation without real-world data distribution as implicit constraints. In order to marry the benefits of both, it seems appealing to learn to generate scenarios from both offline real-world and online simulation data simultaneously. Therefore, we tailor a Reversely Regularized Hybrid Offline-and-Online ((Re)$^2$H2O) Reinforcement Learning recipe to additionally penalize Q-values on real-world data and reward Q-values on simulated data, which ensures the generated scenarios are both varied and adversarial. Through extensive experiments, our solution proves to produce more risky scenarios than competitive baselines and it can generalize to work with various autonomous driving models. In addition, these generated scenarios are also corroborated to be capable of fine-tuning AV performance.
翻译:自动驾驶及其广泛应用长期承载着巨大前景。然而,若缺乏可信且全面的测试流程,产业界不仅难以实现自动驾驶车辆(AV)的量产,公众和政策制定者也无法信服并接纳此类创新。生成对AV构成重大挑战的安全关键场景是测试的首要步骤。真实世界数据集包含自然但过于安全的驾驶行为,而仿真环境则允许无约束地探索多样且激进的交通场景。相反,仿真环境中的高维搜索空间若无真实世界数据分布作为隐式约束,将导致高效场景生成难以实现。为融合两者优势,同时利用离线真实世界数据与在线仿真数据学习场景生成方法具有显著吸引力。为此,我们提出一种反向正则化混合离线-在线((Re)$^2$H2O)强化学习方案,通过对真实世界数据施加Q值额外惩罚、对仿真数据施加Q值奖励,确保生成场景兼具多样性及对抗性。通过大量实验,该方法被证明能比竞争基线生成更多风险场景,且可泛化应用于多种自动驾驶模型。此外,这些生成场景还被证实能有效微调AV性能。