Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language' strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models.

翻译：近年来，大语言模型在强化学习领域，尤其是作为规划器的应用，在学术文献中获得了广泛关注。然而，现有研究大多集中于机器人规划模型，这些模型将感知模型的输出转换为语言形式，从而采用一种“纯语言”策略。在本研究中，我们提出了一种混合式端到端学习框架用于自动驾驶，该框架将基础的驾驶模仿学习与基于多模态提示令牌的大语言模型相结合。我们的创新之处在于两个方面，而非简单地将分离训练模型的感知结果转换为纯语言输入。1) 将视觉与激光雷达的传感器输入端到端地集成到可学习的多模态令牌中，从而从本质上减轻了由分离的预训练感知模型带来的描述偏差。2) 本文并未直接让大语言模型进行驾驶，而是探索了一种混合设置，让大语言模型帮助驾驶模型纠正错误并处理复杂场景。我们的实验结果表明，所提出的方法在通过CARLA进行的离线评估中，可以达到49.21%的驾驶分数，以及91.34%的出色路线完成率。这些性能指标与最先进的驾驶模型相当。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/