In autonomous driving, accurately interpreting the movements of other road users and leveraging this knowledge to forecast future trajectories is crucial. This is typically achieved through the integration of map data and tracked trajectories of various agents. Numerous methodologies combine this information into a singular embedding for each agent, which is then utilized to predict future behavior. However, these approaches have a notable drawback in that they may lose exact location information during the encoding process. The encoding still includes general map information. However, the generation of valid and consistent trajectories is not guaranteed. This can cause the predicted trajectories to stray from the actual lanes. This paper introduces a new refinement module designed to project the predicted trajectories back onto the actual map, rectifying these discrepancies and leading towards more consistent predictions. This versatile module can be readily incorporated into a wide range of architectures. Additionally, we propose a novel scene encoder that handles all relations between agents and their environment in a single unified heterogeneous graph attention network. By analyzing the attention values on the different edges in this graph, we can gain unique insights into the neural network's inner workings leading towards a more explainable prediction.
翻译:在自动驾驶中,准确解读其他道路参与者的运动模式并利用这一知识预测未来轨迹至关重要。这通常通过整合地图数据与各类智能体的跟踪轨迹来实现。许多方法将这些信息融合为每个智能体的单一嵌入向量,进而用于预测未来行为。然而,这些方法存在一个显著缺陷:在编码过程中可能丢失精确的位置信息。尽管编码仍然包含通用地图信息,但生成有效且一致的轨迹无法得到保障。这可能导致预测轨迹偏离实际车道。本文提出一种新型优化模块,旨在将预测轨迹重新投影到实际地图上,纠正此类偏差并实现更一致的预测。这一通用模块可便捷地集成到多种架构中。此外,我们提出一种新型场景编码器,通过统一的异构图注意力网络处理智能体与环境间的所有关联。通过分析图中不同边上的注意力权重,我们可以获得对神经网络内部机理的独特见解,从而实现更具可解释性的预测。