In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders. The encoded traffic scenes are then decoded by a trajectory decoder to generate multimodal future trajectories. Comprehensive experimental results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories, showing its potential for integration into automatic driving simulation systems.
翻译:本文提出了一种新颖的自动驾驶轨迹生成方法,通过结合扩散概率模型(即扩散模型)与Transformer的互补优势。所提出的框架称为“世界中心扩散Transformer”(WcDT),优化了从特征提取到模型推理的整个轨迹生成过程。为增强场景多样性与随机性,首先对历史轨迹数据进行预处理,并利用结合Transformer块增强的扩散模型(DDPM)将其编码至潜在空间。随后,潜在特征、历史轨迹、高精地图特征与历史交通信号信息通过多种基于Transformer的编码器进行融合。编码后的交通场景经轨迹解码器解码,生成多模态未来轨迹。综合实验结果表明,所提方法在生成真实且多样的轨迹方面表现出优越性能,展现了其集成至自动驾驶仿真系统的潜力。