Trajectory prediction for autonomous driving must continuously reason the motion stochasticity of road agents and comply with scene constraints. Existing methods typically rely on one-stage trajectory prediction models, which condition future trajectories on observed trajectories combined with fused scene information. However, they often struggle with complex scene constraints, such as those encountered at intersections. To this end, we present a novel method, called LAformer. It uses a temporally dense lane-aware estimation module to select only the top highly potential lane segments in an HD map, which effectively and continuously aligns motion dynamics with scene information, reducing the representation requirements for the subsequent attention-based decoder by filtering out irrelevant lane segments. Additionally, unlike one-stage prediction models, LAformer utilizes predictions from the first stage as anchor trajectories and adds a second-stage motion refinement module to further explore temporal consistency across the complete time horizon. Extensive experiments on Argoverse 1 and nuScenes demonstrate that LAformer achieves excellent performance for multimodal trajectory prediction.
翻译:自动驾驶中的轨迹预测必须持续推理道路智能体的运动随机性,并遵守场景约束。现有方法通常依赖于单阶段轨迹预测模型,将未来轨迹与观测轨迹结合融合后的场景信息进行条件建模。然而,这些方法在处理复杂场景约束(例如交叉口场景)时往往表现不佳。为此,我们提出了一种名为LAformer的新方法。该方法采用时间密集的车道感知估计模块,仅从高清地图中筛选出最具潜力的车道段,从而有效且持续地将运动动态与场景信息对齐,通过滤除无关车道段来降低后续基于注意力的解码器的表示需求。此外,与单阶段预测模型不同,LAformer将第一阶段预测结果作为锚点轨迹,并增加第二阶段运动精化模块,以进一步探索完整时间范围内的时序一致性。在Argoverse 1和nuScenes上的大量实验表明,LAformer在多模态轨迹预测任务中取得了优异性能。