Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts -- Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7\% LLM-augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.
翻译:运动预测是自动驾驶中最基本的任务之一。传统的运动预测方法主要编码地图的向量信息和交通参与者的历史轨迹数据,缺乏对整体交通语义的全面理解,从而影响了预测任务的性能。在本文中,我们利用大语言模型(LLMs)来增强运动预测任务的全局交通上下文理解。我们首先进行了系统的提示工程,将复杂的交通环境和交通参与者的历史轨迹信息可视化为图像提示——交通上下文地图(TC-Map),并附上相应的文本提示。通过这种方法,我们从LLM中获取了丰富的交通上下文信息。通过将这些信息集成到运动预测模型中,我们证明这种上下文能够提高运动预测的准确性。此外,考虑到LLMs的成本,我们提出了一种经济高效的部署策略:通过仅使用0.7%的LLM增强数据集,大规模提升运动预测任务的准确性。我们的研究为增强LLMs对交通场景的理解以及自动驾驶的运动预测性能提供了宝贵的见解。