Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.

翻译：预测动态交通参与者未来轨迹是自动驾驶领域的基础任务。尽管现有研究已取得显著性能提升，但在场景认知与复杂交通语义理解方面仍存在不足。本文提出Traj-LLM，首次探索无需显式提示工程即可利用大语言模型从智能体的历史/观测轨迹及场景语义中生成未来运动轨迹的潜力。Traj-LLM首先通过稀疏上下文联合编码将智能体与场景特征解构为LLM可理解的形式。在此基础上，我们创新性地发掘LLM强大的理解能力，以捕捉高层场景知识与交互信息的频谱。通过模拟类人车道聚焦认知功能并增强Traj-LLM的场景理解能力，我们引入基于先驱Mamba模块的车道感知概率学习。最后，设计多模态拉普拉斯解码器实现符合场景要求的多模态预测。大量实验表明，得益于LLM强大的先验知识与理解能力，结合车道感知概率学习，Traj-LLM在各项评估指标上均超越现有最优方法。此外，少样本分析进一步验证了Traj-LLM的性能——仅使用50%数据集即可超越多数依赖完整数据的基准方法。本研究探索了将轨迹预测任务与LLM具备的先进能力相结合，为预测智能体运动提供了一种更通用、适应性更强的新型解决方案。