The emergence of general human knowledge and impressive logical reasoning capacity in rapidly progressed vision-language models (VLMs) have driven increasing interest in applying VLMs to high-level autonomous driving tasks, such as scene understanding and decision-making. However, an in-depth study on the relationship between knowledge proficiency, especially essential driving expertise, and closed-loop autonomous driving performance requires further exploration. In this paper, we investigate the effects of the depth and breadth of fundamental driving knowledge on closed-loop trajectory planning and introduce WiseAD, a specialized VLM tailored for end-to-end autonomous driving capable of driving reasoning, action justification, object recognition, risk analysis, driving suggestions, and trajectory planning across diverse scenarios. We employ joint training on driving knowledge and planning datasets, enabling the model to perform knowledge-aligned trajectory planning accordingly. Extensive experiments indicate that as the diversity of driving knowledge extends, critical accidents are notably reduced, contributing 11.9% and 12.4% improvements in the driving score and route completion on the Carla closed-loop evaluations, achieving state-of-the-art performance. Moreover, WiseAD also demonstrates remarkable performance in knowledge evaluations on both in-domain and out-of-domain datasets.
翻译:随着视觉语言模型(VLMs)的快速发展,其展现的通用人类知识与强大的逻辑推理能力,激发了研究者将其应用于高级自动驾驶任务(如场景理解与决策制定)的浓厚兴趣。然而,关于知识熟练度(尤其是关键的驾驶专业知识)与闭环自动驾驶性能之间关系的深入研究仍需进一步探索。本文探究了基础驾驶知识的深度与广度对闭环轨迹规划的影响,并提出了WiseAD——一个专为端到端自动驾驶设计的专用视觉语言模型。该模型能够在多样化场景中执行驾驶推理、行为解释、目标识别、风险分析、驾驶建议及轨迹规划。我们采用驾驶知识与规划数据集的联合训练策略,使模型能够相应地执行与知识对齐的轨迹规划。大量实验表明,随着驾驶知识多样性的扩展,严重事故显著减少,在Carla闭环评估中,驾驶得分与路线完成率分别提升了11.9%与12.4%,达到了最先进的性能水平。此外,WiseAD在领域内与领域外数据集的知识评估中也表现出卓越的性能。