Data augmentation is widely used to enhance generalization in visual classification tasks. However, traditional methods struggle when source and target domains differ, as in domain adaptation, due to their inability to address domain gaps. This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach that enhances both in-domain and cross-domain image classification. Our technique leverages image editing to generate augmented images based on custom conditional prompts, designed specifically for each problem type. By blending portions of the input image with its edited generative counterpart and incorporating fractal patterns, our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models. Efficacy of our method is established with extensive experiments on eight public datasets for general and fine-grained classification, in both in-domain and cross-domain settings. Additionally, we demonstrate performance improvements for self-supervised learning, learning with data scarcity, and adversarial robustness. As compared to the existing state-of-the-art methods, our technique achieves stronger performance across the board.
翻译:数据增强技术被广泛应用于提升视觉分类任务的泛化能力。然而,当源域与目标域存在差异时(例如在域适应场景中),传统方法因其无法有效处理域间差异而面临挑战。本文提出GenMix,一种通用的、基于提示引导的生成式数据增强方法,旨在同时提升域内和跨域图像分类性能。我们的技术利用图像编辑功能,根据针对特定问题类型设计的自定义条件提示生成增强图像。通过将输入图像的部分内容与其经过编辑的生成对应部分进行混合,并结合分形图案,我们的方法减轻了图像不真实性和标签模糊性问题,从而提升了所得模型的性能和对抗鲁棒性。我们在八个公开数据集上进行了广泛的实验,涵盖通用分类和细粒度分类任务,并在域内和跨域两种设置下验证了本方法的有效性。此外,我们还展示了该方法在自监督学习、数据稀缺条件下的学习以及对抗鲁棒性方面的性能提升。与现有最先进方法相比,我们的技术在所有评估中均实现了更优的性能。