Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction. Firstly, we employ different spatial-temporal encoders to embed the decomposed position vectors and the current position of each scene, providing rich features for the subsequent cross-temporal aggregation. Secondly, the relative interaction and cross-temporal aggregation strategies are sequentially adopted to integrate features in the current fusion module, observation interaction module, future feedback module and global fusion module, in which the future feedback module can enable the understanding of pre-action by feeding the influence of preview information to feedforward prediction. Thirdly, the comprehensive interaction features are further fed into final predictor to generate the joint predicted trajectories of multiple agents. Extensive experimental results show that our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
翻译:运动预测在自动驾驶中起着至关重要的作用,旨在预测交通代理未来合理的运动轨迹。现有方法大多仅对代理与环境间的历史交互进行建模,并通过前馈过程预测多模态轨迹,忽略了代理间未来交互可能导致的轨迹变化。本文提出一种新颖的未来反馈交互网络(FFINet),通过聚合当前观测特征与潜在未来交互信息来实现轨迹预测。首先,我们采用不同的时空编码器对分解后的位置向量及各场景的当前位置进行嵌入,为后续跨时间聚合提供丰富特征。其次,依次采用相对交互和跨时间聚合策略,在当前融合模块、观测交互模块、未来反馈模块和全局融合模块中整合特征,其中未来反馈模块通过将预览信息的影响注入前馈预测,实现预行动作的理解。最后,将综合交互特征输入最终预测器,生成多代理的联合预测轨迹。大量实验结果表明,我们的FFINet在Argoverse 1和Argoverse 2运动预测基准上均取得了最先进的性能。