Trajectory prediction is fundamental to various intelligent technologies, such as autonomous driving and robotics. The motion prediction of pedestrians and vehicles helps emergency braking, reduces collisions, and improves traffic safety. Current trajectory prediction research faces problems of complex social interactions, high dynamics and multi-modality. Especially, it still has limitations in long-time prediction. We propose Attention-aware Social Graph Transformer Networks for multi-modal trajectory prediction. We combine Graph Convolutional Networks and Transformer Networks by generating stable resolution pseudo-images from Spatio-temporal graphs through a designed stacking and interception method. Furthermore, we design the attention-aware module to handle social interaction information in scenarios involving mixed pedestrian-vehicle traffic. Thus, we maintain the advantages of the Graph and Transformer, i.e., the ability to aggregate information over an arbitrary number of neighbors and the ability to perform complex time-dependent data processing. We conduct experiments on datasets involving pedestrian, vehicle, and mixed trajectories, respectively. Our results demonstrate that our model minimizes displacement errors across various metrics and significantly reduces the likelihood of collisions. It is worth noting that our model effectively reduces the final displacement error, illustrating the ability of our model to predict for a long time.
翻译:轨迹预测是自动驾驶和机器人等智能技术的基础。行人与车辆的运动预测有助于紧急制动、减少碰撞并提升交通安全。当前轨迹预测研究面临复杂社交交互、高度动态性及多模态性等问题,尤其在长时间预测方面仍存在局限性。我们提出面向注意力感知的社交图Transformer网络用于多模态轨迹预测。通过设计堆叠与截取方法,从时空图中生成稳定分辨率的伪图像,从而结合图卷积网络与Transformer网络。此外,我们设计了注意力感知模块来处理混合行人与车辆交通场景中的社交交互信息。由此,我们保留了图网络与Transformer的优势,即任意数量邻居信息的聚合能力及复杂时序数据处理能力。我们分别在仅含行人、仅含车辆及混合轨迹的数据集上开展实验。结果表明,我们的模型在多种指标上最小化位移误差,并显著降低碰撞概率。值得注意的是,该模型有效降低了最终位移误差,展现了长时间预测能力。