TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population

Understanding trajectory diversity is a fundamental aspect of addressing practical traffic tasks. However, capturing the diversity of trajectories presents challenges, particularly with traditional machine learning and recurrent neural networks due to the requirement of large-scale parameters. The emerging Transformer technology, renowned for its parallel computation capabilities enabling the utilization of models with hundreds of millions of parameters, offers a promising solution. In this study, we apply the Transformer architecture to traffic tasks, aiming to learn the diversity of trajectories within vehicle populations. We analyze the Transformer's attention mechanism and its adaptability to the goals of traffic tasks, and subsequently, design specific pre-training tasks. To achieve this, we create a data structure tailored to the attention mechanism and introduce a set of noises that correspond to spatio-temporal demands, which are incorporated into the structured data during the pre-training process. The designed pre-training model demonstrates excellent performance in capturing the spatial distribution of the vehicle population, with no instances of vehicle overlap and an RMSE of 0.6059 when compared to the ground truth values. In the context of time series prediction, approximately 95% of the predicted trajectories' speeds closely align with the true speeds, within a deviation of 7.5144m/s. Furthermore, in the stability test, the model exhibits robustness by continuously predicting a time series ten times longer than the input sequence, delivering smooth trajectories and showcasing diverse driving behaviors. The pre-trained model also provides a good basis for downstream fine-tuning tasks. The number of parameters of our model is over 50 million.

翻译：理解轨迹多样性是解决实际交通任务的基本环节。然而，捕捉轨迹的多样性面临挑战，特别是传统机器学习与循环神经网络因需要大规模参数而存在局限性。新兴的Transformer技术因其并行计算能力支持使用数亿参数的模型，提供了有前景的解决方案。本研究将Transformer架构应用于交通任务，旨在学习车辆群体中的轨迹多样性。我们分析了Transformer的注意力机制及其对交通任务目标的适应性，并据此设计了特定的预训练任务。为此，我们构建了适配注意力机制的数据结构，并引入了一组对应时空需求的噪声，在预训练过程中将这些噪声融入结构化数据。设计的预训练模型在捕捉车辆群体空间分布方面表现出色，无车辆重叠现象，且与真实值相比的均方根误差（RMSE）为0.6059。在时间序列预测方面，约95%的预测轨迹速度与真实速度高度吻合，偏差控制在7.5144米/秒以内。此外，在稳定性测试中，该模型展现出鲁棒性：可连续预测比输入序列长十倍的时间序列，输出平滑轨迹并呈现多样化的驾驶行为。预训练模型还为下游微调任务提供了良好基础。本模型参数数量超过5000万。