DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.

翻译：数据是深度学习的基石。本文揭示，最新发展的扩散模型是一种用于目标检测的可扩展数据引擎。现有扩增检测数据的方法通常需要人工收集或使用生成模型获取目标图像，再通过数据增强与标注生成训练对，过程昂贵、复杂且缺乏多样性。为解决这些问题，我们提出DiffusionEngine（DE），一种可提供高质量检测训练对的数据扩增引擎，且仅需单一阶段。DE由预训练扩散模型与高效检测适配器组成，能以即插即用方式生成可扩展、多样化且泛化性强的检测数据。检测适配器通过学习对齐现成扩散模型中隐式的语义与位置知识与检测感知信号，从而更精准地预测边界框。此外，我们贡献了两个数据集（COCO-DE与VOC-DE）以扩展现有检测基准，促进后续研究。大量实验表明，通过DE扩增数据可在多种场景下实现显著提升，包括各类检测算法、自监督预训练、数据稀疏、标签稀缺、跨域及半监督学习。例如，基于DINO适配器的DE扩增数据，可使COCO、VOC和Clipart数据集的mAP分别提升3.1%、7.6%和11.5%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日