We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder that converts driving scenarios from absolute states into text description. This text is then processed by a large language model (LLM) to make driving decisions. The upper layer intervenes in the bottom layer's decisions when potential danger is detected, mimicking human reasoning in critical situations. Closed-loop experiments demonstrate that DualAD, using a zero-shot pre-trained model, significantly outperforms rule-based motion planners that lack reasoning abilities. Our experiments also highlight the effectiveness of the text encoder, which considerably enhances the model's scenario understanding. Additionally, the integrated DualAD model improves with stronger LLMs, indicating the framework's potential for further enhancement. We make code and benchmarks publicly available.
翻译:我们提出了一种新颖的自动驾驶框架DualAD,旨在模仿人类驾驶过程中的推理行为。DualAD包含两层结构:底层是一个基于规则的运动规划器,负责处理需要最少推理的常规驾驶任务;上层则包含一个基于规则的文本编码器,将驾驶场景从绝对状态转换为文本描述。随后,该文本由一个大语言模型(LLM)进行处理以做出驾驶决策。当检测到潜在危险时,上层会干预底层的决策,从而模拟人类在关键情境下的推理过程。闭环实验表明,使用零样本预训练模型的DualAD,其性能显著优于缺乏推理能力的基于规则的运动规划器。我们的实验还突显了文本编码器的有效性,它显著增强了模型对场景的理解能力。此外,集成的DualAD模型会随着LLM能力的增强而改进,这表明该框架具有进一步优化的潜力。我们已公开代码和基准测试集。