Autonomous vehicles require motion forecasting of their surrounding multi-agents (pedestrians and vehicles) to make optimal decisions for navigation. The existing methods focus on techniques to utilize the positions and velocities of these agents and fail to capture semantic information from the scene. Moreover, to mitigate the increase in computational complexity associated with the number of agents in the scene, some works leverage Euclidean distance to prune far-away agents. However, distance-based metric alone is insufficient to select relevant agents and accurately perform their predictions. To resolve these issues, we propose Semantics-aware Interactive Motion Forecasting (SIMF) method to capture semantics along with spatial information, and optimally select relevant agents for motion prediction. Specifically, we achieve this by implementing a semantic-aware selection of relevant agents from the scene and passing them through an attention mechanism to extract global encodings. These encodings along with agents' local information are passed through an encoder to obtain time-dependent latent variables for a motion policy predicting the future trajectories. Our results show that the proposed approach outperforms state-of-the-art baselines and provides more accurate predictions in a scene-consistent manner.
翻译:自动驾驶汽车需要对其周围多智能体(行人和车辆)进行运动预测,以做出最优导航决策。现有方法主要关注利用这些智能体的位置和速度信息,但未能捕捉场景中的语义信息。此外,为缓解因场景中智能体数量增加而带来的计算复杂度增长,部分研究采用欧氏距离剪除远距离智能体。然而,仅基于距离的度量不足以有效筛选相关智能体并准确完成预测。为解决上述问题,我们提出语义感知交互式运动预测(Semantics-aware Interactive Motion Forecasting, SIMF)方法,在获取空间信息的同时捕捉语义信息,并优化选择相关智能体进行运动预测。具体实现中,我们通过从场景中执行语义感知的相关智能体筛选,并利用注意力机制提取全局编码。这些编码与智能体的局部信息共同输入编码器,获取面向运动策略的时变潜变量,进而预测未来轨迹。实验结果表明,所提方法优于现有最优基线模型,能以场景一致的方式提供更精确的预测。