We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder that converts driving scenarios from absolute states into text description. This text is then processed by a large language model (LLM) to make driving decisions. The upper layer intervenes in the bottom layer's decisions when potential danger is detected, mimicking human reasoning in critical situations. Closed-loop experiments demonstrate that DualAD, using a zero-shot pre-trained model, significantly outperforms rule-based motion planners that lack reasoning abilities. Our experiments also highlight the effectiveness of the text encoder, which considerably enhances the model's scenario understanding. Additionally, the integrated DualAD model improves with stronger LLMs, indicating the framework's potential for further enhancement. Code and benchmarks are available at github.com/TUM-AVS/DualAD.
翻译:我们提出了一种新颖的自动驾驶框架DualAD,旨在模仿人类驾驶过程中的推理行为。DualAD包含两层结构:底层是一个基于规则的运动规划器,负责处理仅需极少推理的常规驾驶任务;上层则包含一个基于规则的文本编码器,将驾驶场景从绝对状态转换为文本描述。该文本随后由一个大语言模型(LLM)进行处理以做出驾驶决策。当检测到潜在危险时,上层会介入并调整底层的决策,从而模拟人类在关键情境下的推理过程。闭环实验表明,采用零样本预训练模型的DualAD在性能上显著优于缺乏推理能力的基于规则的运动规划器。我们的实验还凸显了文本编码器的有效性,它显著提升了模型对场景的理解能力。此外,集成的DualAD模型在使用更强大的LLM时性能得到提升,这表明该框架具备进一步优化的潜力。代码与基准测试已在github.com/TUM-AVS/DualAD发布。