The evaluation and training of autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. To address this problem, we propose Dragtraffic, a generalized, point-based, and controllable traffic scene generation framework based on conditional diffusion. Dragtraffic enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture. We use a regression model to provide a general initial solution and a refinement process based on the conditional diffusion model to ensure diversity. User-customized context is introduced through cross-attention to ensure high controllability. Experiments on a real-world driving dataset show that Dragtraffic outperforms existing methods in terms of authenticity, diversity, and freedom.
翻译:自动驾驶系统的评估与训练需要多样且可扩展的极端场景。然而,现有的大多数场景生成方法缺乏可控性、精度和通用性,导致生成结果不尽如人意。针对这一问题,我们提出了一种基于条件扩散的通用化、点基可控交通场景生成框架Dragtraffic。该框架通过自适应混合专家架构,使非专家用户能够为不同类型的交通智能体生成多种逼真的驾驶场景。我们采用回归模型提供通用的初始解,并基于条件扩散模型进行优化过程以确保多样性。通过交叉注意力机制引入用户定制化上下文,从而实现高度可控性。在真实驾驶数据集上的实验表明,Dragtraffic在真实性、多样性和自由度方面均优于现有方法。