Predicting future motions of nearby agents is essential for an autonomous vehicle to take safe and effective actions. In this paper, we propose TSGN, a framework using Temporal Scene Graph Neural Networks with projected vectorized representations for multi-agent trajectory prediction. Projected vectorized representation models the traffic scene as a graph which is constructed by a set of vectors. These vectors represent agents, road network, and their spatial relative relationships. All relative features under this representation are both translationand rotation-invariant. Based on this representation, TSGN captures the spatial-temporal features across agents, road network, interactions among them, and temporal dependencies of temporal traffic scenes. TSGN can predict multimodal future trajectories for all agents simultaneously, plausibly, and accurately. Meanwhile, we propose a Hierarchical Lane Transformer for capturing interactions between agents and road network, which filters the surrounding road network and only keeps the most probable lane segments which could have an impact on the future behavior of the target agent. Without sacrificing the prediction performance, this greatly reduces the computational burden. Experiments show TSGN achieves state-of-the-art performance on the Argoverse motion forecasting benchmar.
翻译:预测附近智能体的未来运动对于自动驾驶车辆采取安全有效的行动至关重要。本文提出TSGN框架,该框架利用时序场景图神经网络结合投影向量化表示进行多智能体轨迹预测。投影向量化表示将交通场景建模为由一组向量构成的图结构,这些向量表示智能体、道路网络及其空间相对关系。在此表示下,所有相对特征均具备平移不变性和旋转不变性。基于该表示,TSGN能够捕捉智能体间、道路网络中的时空特征、它们之间的交互作用,以及时序交通场景的时间依赖关系。TSGN可同时、合理且准确地预测所有智能体的多模态未来轨迹。同时,我们提出分层车道变换器用于捕捉智能体与道路网络的交互,该模块可过滤周边道路网络,仅保留可能对目标智能体未来行为产生影响的概率最高的车道段。在不损失预测性能的前提下,该方法大幅降低了计算负担。实验表明,TSGN在Argoverse运动预测基准上达到了最先进的性能。