Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

Fine-grained visual classification (FGVC) involves classifying closely related sub-classes. This task is difficult due to the subtle differences between classes and the high intra-class variance. Moreover, FGVC datasets are typically small and challenging to gather, thus highlighting a significant need for effective data augmentation. Recent advancements in text-to-image diffusion models offer new possibilities for augmenting classification datasets. While these models have been used to generate training data for classification tasks, their effectiveness in full-dataset training of FGVC models remains under-explored. Recent techniques that rely on Text2Image generation or Img2Img methods, often struggle to generate images that accurately represent the class while modifying them to a degree that significantly increases the dataset's diversity. To address these challenges, we present SaSPA: Structure and Subject Preserving Augmentation. Contrary to recent methods, our method does not use real images as guidance, thereby increasing generation flexibility and promoting greater diversity. To ensure accurate class representation, we employ conditioning mechanisms, specifically by conditioning on image edges and subject representation. We conduct extensive experiments and benchmark SaSPA against both traditional and recent generative data augmentation methods. SaSPA consistently outperforms all established baselines across multiple settings, including full dataset training, contextual bias, and few-shot classification. Additionally, our results reveal interesting patterns in using synthetic data for FGVC models; for instance, we find a relationship between the amount of real data used and the optimal proportion of synthetic data. Code is available at https://github.com/EyalMichaeli/SaSPA-Aug.

翻译：细粒度视觉分类（FGVC）涉及对紧密相关的子类进行分类。由于类别间差异细微且类内方差较大，该任务具有较高难度。此外，FGVC数据集通常规模较小且收集困难，因此对有效数据增强方法的需求尤为迫切。文本到图像扩散模型的最新进展为分类数据集增强提供了新的可能性。虽然这些模型已用于生成分类任务的训练数据，但它们在FGVC模型全数据集训练中的有效性仍有待深入探索。现有基于Text2Image生成或Img2Img方法的技术往往难以在准确表征类别的同时，对图像进行足够程度的修改以显著提升数据集多样性。为解决这些挑战，我们提出SaSPA：结构与主体保持增强方法。与近期方法不同，我们的方法不以真实图像作为引导，从而提高了生成灵活性并促进更大多样性。为确保准确的类别表征，我们采用条件机制，具体通过对图像边缘和主体表征进行条件约束。我们进行了大量实验，并将SaSPA与传统及近期生成式数据增强方法进行基准比较。在包括全数据集训练、上下文偏置和少样本分类在内的多种设置中，SaSPA始终优于所有现有基线方法。此外，我们的研究揭示了使用合成数据训练FGVC模型的有趣规律；例如，我们发现真实数据使用量与合成数据最优比例之间存在关联。代码发布于https://github.com/EyalMichaeli/SaSPA-Aug。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日