Accurate motion prediction of pedestrians, cyclists, and other surrounding vehicles (all called agents) is very important for autonomous driving. Most existing works capture map information through an one-stage interaction with map by vector-based attention, to provide map constraints for social interaction and multi-modal differentiation. However, these methods have to encode all required map rules into the focal agent's feature, so as to retain all possible intentions' paths while at the meantime to adapt to potential social interaction. In this work, a progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints. The network progressively encode the complex influence of map constraints into the agent's feature through graph convolutions at the following three stages: after historical trajectory encoder, after social interaction, and after multi-modal differentiation. In addition, a weight allocation mechanism is proposed for multi-modal training, so that each mode can obtain learning opportunities from a single-mode ground truth. Experiments have validated the superiority of progressive interactions to the existing one-stage interaction, and demonstrate the effectiveness of each component. Encouraging results were obtained in the challenging benchmarks.
翻译:行人、骑行者和周围其他车辆(统称为智能体)的精确运动预测对于自动驾驶至关重要。现有方法大多通过基于向量的注意力机制与地图进行单阶段交互以捕获地图信息,从而为社交交互和多模态差异化提供地图约束。然而,这些方法必须将所有必要的地图规则编码至目标智能体的特征中,以便在保留所有可能意图路径的同时适应潜在的社交交互。本文提出渐进交互网络,使智能体特征能够逐步聚焦相关地图,从而更好地学习捕获相关地图约束的智能体特征表示。该网络通过历史轨迹编码器后、社交交互后以及多模态差异化后三个阶段中执行的图卷积,将地图约束的复杂影响渐进式编码至智能体特征中。此外,本文还提出针对多模态训练的权重分配机制,使每个模态都能从单模态真值获得学习机会。实验验证了渐进交互相比现有单阶段交互的优越性,并证明了各组成部分的有效性,在具有挑战性的基准测试中取得了令人鼓舞的结果。