Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.
翻译:许多研究已提出利用扩散模型增强下游任务(如图像分类)的训练数据集。然而,扩散模型本身基于大规模数据集训练,且常带有噪声标注,这类模型对下游分类性能的贡献程度仍是悬而未决的问题。具体而言,尚不明确这些模型是否具备足够的泛化能力,使其优于直接使用预训练过程中的额外数据进行增强。我们系统评估了多种基于扩散模型生成图像的现有方法,并研究了新扩展以评估其在数据增强中的效用。针对目标数据对扩散模型进行个性化定制,其效果优于简单的提示策略。然而,仅通过简单的最近邻检索过程直接使用扩散模型的预训练数据,反而能带来更强的下游性能。本研究探索了扩散模型生成新训练数据的潜力,并意外发现,在简单的下游视觉任务中,这些复杂模型尚未能超越简单而强大的图像检索基线方法。