In autonomous driving, deep models have shown remarkable performance across various visual perception tasks with the demand of high-quality and huge-diversity training datasets. Such datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects. However, manually collecting these data presents huge challenges and expensive cost. With the rapid development of large generative models, we propose DriveDiTFit, a novel method for efficiently generating autonomous Driving data by Fine-tuning pre-trained Diffusion Transformers (DiTs). Specifically, DriveDiTFit utilizes a gap-driven modulation technique to carefully select and efficiently fine-tune a few parameters in DiTs according to the discrepancy between the pre-trained source data and the target driving data. Additionally, DriveDiTFit develops an effective weather and lighting condition embedding module to ensure diversity in the generated data, which is initialized by a nearest-semantic-similarity initialization approach. Through progressive tuning scheme to refined the process of detail generation in early diffusion process and enlarging the weights corresponding to small objects in training loss, DriveDiTFit ensures high-quality generation of small moving objects in the generated data. Extensive experiments conducted on driving datasets confirm that our method could efficiently produce diverse real driving data. The source codes will be available at https://github.com/TtuHamg/DriveDiTFit.
翻译:在自动驾驶领域,深度模型已在各类视觉感知任务中展现出卓越性能,这依赖于高质量、高多样性的训练数据集。此类数据集需覆盖包含恶劣天气、复杂光照条件及多样运动物体在内的各种驾驶场景。然而,人工采集此类数据面临巨大挑战且成本高昂。随着大规生成模型的快速发展,本文提出DriveDiTFit——一种通过微调预训练扩散Transformer(DiTs)高效生成自动驾驶数据的新方法。具体而言,DriveDiTFit采用间隙驱动调制技术,根据预训练源数据与目标驾驶数据间的差异,精心筛选并高效微调DiTs中的少量参数。此外,DriveDiTFit设计了有效的天气与光照条件嵌入模块,通过最近语义相似度初始化方法进行参数初始化,确保生成数据的多样性。通过渐进式调优方案以改进扩散早期过程的细节生成,并增大训练损失中对应小物体的权重,DriveDiTFit确保了生成数据中小型运动物体的高质量生成。在驾驶数据集上的大量实验证实,本方法能高效生成多样化的真实驾驶数据。源代码发布于https://github.com/TtuHamg/DriveDiTFit。