Fine-grained visual classification (FGVC) involves classifying closely related sub-classes. This task is difficult due to the subtle differences between classes and the high intra-class variance. Moreover, FGVC datasets are typically small and challenging to gather, thus highlighting a significant need for effective data augmentation. Recent advancements in text-to-image diffusion models offer new possibilities for augmenting classification datasets. While these models have been used to generate training data for classification tasks, their effectiveness in full-dataset training of FGVC models remains under-explored. Recent techniques that rely on Text2Image generation or Img2Img methods, often struggle to generate images that accurately represent the class while modifying them to a degree that significantly increases the dataset's diversity. To address these challenges, we present SaSPA: Structure and Subject Preserving Augmentation. Contrary to recent methods, our method does not use real images as guidance, thereby increasing generation flexibility and promoting greater diversity. To ensure accurate class representation, we employ conditioning mechanisms, specifically by conditioning on image edges and subject representation. We conduct extensive experiments and benchmark SaSPA against both traditional and recent generative data augmentation methods. SaSPA consistently outperforms all established baselines across multiple settings, including full dataset training, contextual bias, and few-shot classification. Additionally, our results reveal interesting patterns in using synthetic data for FGVC models; for instance, we find a relationship between the amount of real data used and the optimal proportion of synthetic data. Code is available at https://github.com/EyalMichaeli/SaSPA-Aug.
翻译:细粒度视觉分类(FGVC)涉及对紧密相关的子类进行分类。由于类别间差异细微且类内方差较大,该任务具有较高难度。此外,FGVC数据集通常规模较小且收集困难,因此对有效数据增强方法的需求尤为迫切。文本到图像扩散模型的最新进展为分类数据集增强提供了新的可能性。虽然这些模型已用于生成分类任务的训练数据,但它们在FGVC模型全数据集训练中的有效性仍有待深入探索。现有基于Text2Image生成或Img2Img方法的技术往往难以在准确表征类别的同时,对图像进行足够程度的修改以显著提升数据集多样性。为解决这些挑战,我们提出SaSPA:结构与主体保持增强方法。与近期方法不同,我们的方法不以真实图像作为引导,从而提高了生成灵活性并促进更大多样性。为确保准确的类别表征,我们采用条件机制,具体通过对图像边缘和主体表征进行条件约束。我们进行了大量实验,并将SaSPA与传统及近期生成式数据增强方法进行基准比较。在包括全数据集训练、上下文偏置和少样本分类在内的多种设置中,SaSPA始终优于所有现有基线方法。此外,我们的研究揭示了使用合成数据训练FGVC模型的有趣规律;例如,我们发现真实数据使用量与合成数据最优比例之间存在关联。代码发布于https://github.com/EyalMichaeli/SaSPA-Aug。