In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To address this problem, we propose that an ideal AD system should drive like a human, accumulating experience through continuous driving and using common sense to solve problems. To achieve this goal, we identify three key abilities necessary for an AD system: reasoning, interpretation, and memorization. We demonstrate the feasibility of employing an LLM in driving scenarios by building a closed-loop system to showcase its comprehension and environment-interaction abilities. Our extensive experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases, providing valuable insights for the development of human-like autonomous driving. The related code are available at https://github.com/PJLab-ADG/DriveLikeAHuman .
翻译:本文探讨了利用大语言模型以类人方式理解驾驶环境的潜力,并分析了其在面对复杂场景时的推理、解释与记忆能力。我们认为,传统的基于优化和模块化的自动驾驶系统在处理长尾极端情况时存在固有的性能局限。为解决此问题,我们提出理想的自动驾驶系统应像人一样驾驶,通过持续驾驶积累经验,并运用常识解决问题。为实现这一目标,我们识别出自动驾驶系统所需的三种关键能力:推理、解释与记忆。通过构建闭环系统,我们证明了在驾驶场景中部署大语言模型的可行性,展示了其理解与环境互动能力。大量实验表明,大语言模型在推理与解决长尾案例方面展现出令人印象深刻的能力,为类人自动驾驶的发展提供了宝贵见解。相关代码见 https://github.com/PJLab-ADG/DriveLikeAHuman。