Trajectory prediction has been a long-standing problem in intelligent systems like autonomous driving and robot navigation. Models trained on large-scale benchmarks have made significant progress in improving prediction accuracy. However, the importance on efficiency for real-time applications has been less emphasized. This paper proposes an attention-based graph model, named GATraj, which achieves a good balance of prediction accuracy and inference speed. We use attention mechanisms to model the spatial-temporal dynamics of agents, such as pedestrians or vehicles, and a graph convolutional network to model their interactions. Additionally, a Laplacian mixture decoder is implemented to mitigate mode collapse and generate diverse multimodal predictions for each agent. GATraj achieves state-of-the-art prediction performance at a much higher speed when tested on the ETH/UCY datasets for pedestrian trajectories, and good performance at about 100 Hz inference speed when tested on the nuScenes dataset for autonomous driving. We conduct extensive experiments to analyze the probability estimation of the Laplacian mixture decoder and compare it with a Gaussian mixture decoder for predicting different multimodalities. Furthermore, comprehensive ablation studies demonstrate the effectiveness of each proposed module in GATraj. The code is released at https://github.com/mengmengliu1998/GATraj.
翻译:轨迹预测一直是智能系统(如自动驾驶与机器人导航)中的长期问题。在大规模基准上训练的模型在提高预测精度方面已取得显著进展,然而面向实时应用的高效性这一重要性尚未得到足够重视。本文提出一种基于注意力的图模型GATraj,它在预测精度与推理速度之间实现了良好平衡。我们采用注意力机制建模智能体(如行人或车辆)的时空动态行为,并利用图卷积网络建模其交互关系。此外,通过引入拉普拉斯混合解码器来缓解模式坍塌问题,并为每个智能体生成多样化的多模态预测。在行人轨迹的ETH/UCY数据集上,GATraj以更高速度实现了最先进的预测性能;在自动驾驶的nuScenes数据集上,其推理速度约为100 Hz,且表现良好。我们开展了大量实验,分析拉普拉斯混合解码器的概率估计方法,并将其与高斯混合解码器在预测不同多模态时的表现进行比较。此外,全面的消融研究验证了GATraj中各模块的有效性。代码已开源至https://github.com/mengmengliu1998/GATraj。