Image retrieval outperforms diffusion models on data augmentation

Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.

翻译：许多研究已提出利用扩散模型增强下游任务（如图像分类）的训练数据集。然而，扩散模型本身基于大规模数据集训练，且常带有噪声标注，这类模型对下游分类性能的贡献程度仍是悬而未决的问题。具体而言，尚不明确这些模型是否具备足够的泛化能力，使其优于直接使用预训练过程中的额外数据进行增强。我们系统评估了多种基于扩散模型生成图像的现有方法，并研究了新扩展以评估其在数据增强中的效用。针对目标数据对扩散模型进行个性化定制，其效果优于简单的提示策略。然而，仅通过简单的最近邻检索过程直接使用扩散模型的预训练数据，反而能带来更强的下游性能。本研究探索了扩散模型生成新训练数据的潜力，并意外发现，在简单的下游视觉任务中，这些复杂模型尚未能超越简单而强大的图像检索基线方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日