Real-world autonomous driving systems must make safe decisions in the face of rare and diverse traffic scenarios. Current state-of-the-art planners are mostly evaluated on real-world datasets like nuScenes (open-loop) or nuPlan (closed-loop). In particular, nuPlan seems to be an expressive evaluation method since it is based on real-world data and closed-loop, yet it mostly covers basic driving scenarios. This makes it difficult to judge a planner's capabilities to generalize to rarely-seen situations. Therefore, we propose a novel closed-loop benchmark interPlan containing several edge cases and challenging driving scenarios. We assess existing state-of-the-art planners on our benchmark and show that neither rule-based nor learning-based planners can safely navigate the interPlan scenarios. A recently evolving direction is the usage of foundation models like large language models (LLM) to handle generalization. We evaluate an LLM-only planner and introduce a novel hybrid planner that combines an LLM-based behavior planner with a rule-based motion planner that achieves state-of-the-art performance on our benchmark.
翻译:现实世界的自动驾驶系统必须在面对罕见且多样的交通场景时做出安全决策。当前最先进的规划器主要基于nuScenes(开环)或nuPlan(闭环)等真实数据集进行评估。特别是nuPlan作为一种闭环评估方法,虽基于真实数据且具有表现力,但其覆盖的驾驶场景多为基本类型,导致难以判断规划器对罕见场景的泛化能力。为此,我们提出一种名为interPlan的新型闭环基准测试,包含多个边缘案例和具有挑战性的驾驶场景。通过评估现有最先进规划器,我们发现无论是基于规则还是基于学习的规划器,均无法安全应对interPlan场景。近期一个新兴方向是利用大语言模型(LLM)等基础模型处理泛化问题。我们评估了纯LLM规划器,并引入了一种新型混合规划器——将基于LLM的行为规划器与基于规则的运动规划器相结合,该混合方法在我们的基准测试中取得了最先进的性能。