Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.
翻译:数据是深度学习的基石。本文揭示,最新发展的扩散模型是一种用于目标检测的可扩展数据引擎。现有扩增检测数据的方法通常需要人工收集或使用生成模型获取目标图像,再通过数据增强与标注生成训练对,过程昂贵、复杂且缺乏多样性。为解决这些问题,我们提出DiffusionEngine(DE),一种可提供高质量检测训练对的数据扩增引擎,且仅需单一阶段。DE由预训练扩散模型与高效检测适配器组成,能以即插即用方式生成可扩展、多样化且泛化性强的检测数据。检测适配器通过学习对齐现成扩散模型中隐式的语义与位置知识与检测感知信号,从而更精准地预测边界框。此外,我们贡献了两个数据集(COCO-DE与VOC-DE)以扩展现有检测基准,促进后续研究。大量实验表明,通过DE扩增数据可在多种场景下实现显著提升,包括各类检测算法、自监督预训练、数据稀疏、标签稀缺、跨域及半监督学习。例如,基于DINO适配器的DE扩增数据,可使COCO、VOC和Clipart数据集的mAP分别提升3.1%、7.6%和11.5%。