Motivated by the emergent reasoning capabilities of Vision Language Models (VLMs) and its potential to improve the comprehensibility of autonomous driving systems, this paper introduces a closed-loop autonomous driving controller called VLM-MPC, which combines a VLM for high-level decision-making and a Model Predictive Controller (MPC) for low-level vehicle control. The proposed VLM-MPC system is structurally divided into two asynchronous components: an upper-level VLM and a lower-level MPC. The upper layer VLM generates driving parameters for lower-level control based on front camera images, ego vehicle state, traffic environment conditions, and reference memory. The lower-level MPC controls the vehicle in real-time using these parameters, considering engine lag and providing state feedback to the entire system. Experiments based on the nuScenes dataset validated the effectiveness of the proposed VLM-MPC system across various scenarios (e.g., night, rain, intersections). Results showed that the VLM-MPC system consistently outperformed baseline models in terms of safety and driving comfort. By comparing behaviors under different weather conditions and scenarios, we demonstrated the VLM's ability to understand the environment and make reasonable inferences.
翻译:受视觉语言模型涌现的推理能力及其提升自动驾驶系统可理解性潜力的启发,本文提出了一种名为VLM-MPC的闭环自动驾驶控制器,该控制器结合了用于高层决策的视觉语言模型和用于底层车辆控制的模型预测控制器。所提出的VLM-MPC系统在结构上分为两个异步组件:上层VLM和下层MPC。上层VLM基于前视摄像头图像、自车状态、交通环境条件及参考记忆,为下层控制生成驾驶参数。下层MPC利用这些参数实时控制车辆,同时考虑发动机滞后并为整个系统提供状态反馈。基于nuScenes数据集的实验验证了所提出的VLM-MPC系统在各种场景下的有效性。结果表明,VLM-MPC系统在安全性和驾驶舒适性方面均持续优于基线模型。通过比较不同天气条件和场景下的行为,我们证明了VLM理解环境并做出合理推理的能力。