Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. Specifically, we gather simulation-based metrics, such as progress and time to collision, by unrolling bird's eye view abstractions of the test scenes for a short simulation horizon. Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other. As we demonstrate empirically, this decoupling allows open-loop metric computation while being better aligned with closed-loop evaluations than traditional displacement errors. NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights. On a large set of challenging scenarios, we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD. Our modular framework can potentially be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges. Our code is available at https://github.com/autonomousvision/navsim.
翻译:基于视觉的驾驶策略基准测试具有挑战性。一方面,使用真实数据进行开环评估较为容易,但此类结果无法反映闭环性能。另一方面,在仿真环境中进行闭环评估是可行的,但由于其巨大的计算需求而难以扩展。此外,当前可用的仿真器与真实数据之间存在显著的领域差距。这导致我们难以从快速增长的端到端自动驾驶研究中得出明确结论。本文提出NAVSIM,作为上述评估范式之间的折中方案:我们利用大规模数据集结合非反应式仿真器,以实现大规模真实世界基准测试。具体而言,我们通过对测试场景的鸟瞰图抽象进行短时仿真推演,收集基于仿真的度量指标,如行驶进度和碰撞时间。我们的仿真采用非反应式设计,即被评估的策略与环境互不影响。如实证所示,这种解耦方式允许进行开环指标计算,同时比传统的位移误差更贴近闭环评估结果。NAVSIM支撑了CVPR 2024举办的全新竞赛,共有143支团队提交463份方案,并催生了多项新发现。在大量复杂场景中,我们观察到计算需求适中的简单方法(如TransFuser)能够媲美近期的大规模端到端驾驶架构(如UniAD)。我们的模块化框架可扩展支持新数据集、数据筛选策略和评估指标,并将持续维护以承载未来挑战赛。代码发布于https://github.com/autonomousvision/navsim。