TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population

Understanding trajectory diversity is a fundamental aspect of addressing practical traffic tasks. However, capturing the diversity of trajectories presents challenges, particularly with traditional machine learning and recurrent neural networks due to the requirement of large-scale parameters. The emerging Transformer technology, renowned for its parallel computation capabilities enabling the utilization of models with hundreds of millions of parameters, offers a promising solution. In this study, we apply the Transformer architecture to traffic tasks, aiming to learn the diversity of trajectories within vehicle populations. We analyze the Transformer's attention mechanism and its adaptability to the goals of traffic tasks, and subsequently, design specific pre-training tasks. To achieve this, we create a data structure tailored to the attention mechanism and introduce a set of noises that correspond to spatio-temporal demands, which are incorporated into the structured data during the pre-training process. The designed pre-training model demonstrates excellent performance in capturing the spatial distribution of the vehicle population, with no instances of vehicle overlap and an RMSE of 0.6059 when compared to the ground truth values. In the context of time series prediction, approximately 95% of the predicted trajectories' speeds closely align with the true speeds, within a deviation of 7.5144m/s. Furthermore, in the stability test, the model exhibits robustness by continuously predicting a time series ten times longer than the input sequence, delivering smooth trajectories and showcasing diverse driving behaviors. The pre-trained model also provides a good basis for downstream fine-tuning tasks. The number of parameters of our model is over 50 million.

翻译：理解轨迹多样性是解决实际交通任务的基础。然而，由于需要大规模参数，传统机器学习与循环神经网络在捕获轨迹多样性方面面临挑战。新兴的Transformer技术凭借其并行计算能力，能够利用包含数亿参数的模型，为这一问题提供了有前景的解决方案。本研究将Transformer架构应用于交通任务，旨在学习车辆群体内轨迹的多样性。我们分析了Transformer的注意力机制及其对交通任务目标的适应性，并据此设计了特定的预训练任务。为此，我们构建了适配注意力机制的数据结构，并引入一组符合时空需求的噪声，在预训练过程中将其融入结构化数据。所设计的预训练模型在捕获车辆群体空间分布方面表现出色，无车辆重叠现象，与真实值相比均方根误差（RMSE）为0.6059。在时间序列预测中，约95%的预测轨迹速度与真实速度高度一致，偏差在7.5144米/秒以内。此外，在稳定性测试中，模型展现出鲁棒性，可连续预测长度为输入序列十倍的时序数据，生成平滑轨迹并呈现多样化驾驶行为。该预训练模型还为下游微调任务提供了良好基础。我们的模型参数数量超过5000万。