Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce images that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain-specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix (https://github.com/Zhicaiwww/Diff-Mix), which enriches the dataset by performing image translations between classes. Our empirical results demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets.
翻译:文本到图像(T2I)生成模型近期已成为一种强大工具,能够生成逼真图像并催生了众多应用。然而,如何将T2I模型有效集成到基础图像分类任务中仍是一个悬而未决的问题。目前主流策略是通过T2I模型生成的合成图像扩充训练集以提升图像分类性能。本研究深入剖析了当前生成式数据增强技术与传统数据增强技术的缺陷。分析表明,这些方法在面向特定领域概念时,难以生成兼具真实性(在前景对象方面)与多样性(在背景上下文方面)的图像。为解决这一挑战,我们提出了一种名为Diff-Mix(https://github.com/Zhicaiwww/Diff-Mix)的创新类间数据增强方法,通过执行类别间图像转换来丰富数据集。实证结果表明,Diff-Mix在真实性与多样性之间实现了更优平衡,显著提升了各类图像分类场景(包括少样本分类、常规分类及长尾分类)中特定领域数据集的性能表现。