Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.
翻译:在闭环仿真中,规划器评估通常采用基于规则的交通智能体,其简单被动的行为可能掩盖规划器的缺陷并导致排名偏差。广泛使用的IDM智能体仅跟随前车行驶,无法对相邻车道车辆作出反应,这阻碍了对复杂交互能力的测试。我们通过将最先进的学习型交通智能体模型SMART集成到nuPlan中来解决这一问题。由此,我们首次在更接近现实的条件下评估规划器,并量化了当缩小仿真与现实差距时结论如何变化。我们的分析涵盖14个近期规划器及既有基线方法,结果表明基于IDM的仿真高估了规划性能:几乎所有评分均出现下降。相反,许多规划器展现出比预期更好的交互能力,在变道或转弯等多车道、高交互场景中甚至表现更优。采用闭环训练的方法展现出最佳且最稳定的驾驶性能。然而,当在增强的边缘案例场景中达到性能极限时,所有学习型规划器均出现性能骤降,而基于规则的规划器仍能保持合理的基础行为。基于研究结果,我们建议将SMART反应式仿真作为nuPlan中新的标准闭环基准,并在https://github.com/shgd95/InteractiveClosedLoop 发布了可直接替代IDM的SMART智能体。