In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.
翻译:在自动驾驶任务中,复杂交通环境下的轨迹预测需要遵循真实世界的场景约束和行为的多样性。现有方法主要依赖先验假设或在人工数据上训练的生成模型来学习受场景约束的道路代理随机行为。然而,由于数据不平衡和简单先验,这些方法常面临模式平均问题,甚至可能因训练不稳定和单一真值监督而出现模式崩溃。这些问题导致现有方法丧失预测多样性并偏离场景约束。为解决上述挑战,我们提出一种名为可控扩散轨迹(CDT)的新型轨迹生成器,该模型将地图信息和社交交互融入基于Transformer的条件去噪扩散模型中,以引导未来轨迹的预测。为确保多模态性,我们引入行为标记来指导轨迹模式(如直行、左转或右转)。此外,我们将预测终点作为替代行为标记融入CDT模型,以促进精准轨迹预测。在Argoverse 2基准上的大量实验表明,CDT在城市复杂环境中生成多样化且符合场景约束的轨迹方面表现优异。