While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, overtaking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reducing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community's efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle's future trajectory. By leveraging these intention and occupancy prediction priors, our method conducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves first place on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the soft mean average precision (softmAP) by 10 percent compared to the second-best method in the Waymo Interactive Prediction Leaderboard. Furthermore, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real-world applications.
翻译:尽管先前研究大多聚焦于提升多模态轨迹预测的精度,但对多模态行为意图(如让行、超车)的显式建模仍相对不足。本文提出一个统一框架,通过联合预测行为意图与轨迹来提升预测准确性、可解释性与效率。具体而言,我们采用共享上下文编码器同时处理意图与轨迹预测,从而降低结构冗余与信息损失。针对主流数据集(Waymo、Argoverse)缺乏真实行为意图标注的问题,我们通过自动标注机制对这些数据集进行标注处理,以推动该方向的研究进展。我们进一步引入向量化占据预测模块,用于推断目标车辆未来轨迹占据地图各折线段的概率。通过利用这些意图与占据预测先验,本方法在解码阶段对无关智能体及地图折线进行动态的模态依赖剪枝,有效降低计算开销并抑制非关键元素引入的噪声。本方法在Waymo运动数据集的非激光雷达方法中位列第一,并在Waymo交互预测数据集中取得榜首。值得注意的是,即使不采用模型集成策略,我们的单模型框架在Waymo交互预测排行榜上较第二名方法将软平均精度均值(softmAP)提升了10%。此外,所提框架已成功部署于实车系统,验证了其在真实场景中的实际有效性。