The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained deep models to achieve the objective. However, when only a limited amount of samples is available, similar methods may become less effective. Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation. However, the high level of detail required for fine-grained images makes it challenging for existing methods to be directly employed. To address this issue, we propose a novel approach termed the detail reinforcement diffusion model~(DRDM), which leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference~(SKR). Specifically, DSR is designed to extract implicit similarity relationships from the labels and reconstruct the semantic mapping between labels and instances, which enables better discrimination of subtle differences between different subclasses. Furthermore, we introduce the SKR module, which incorporates the distributions of different datasets as references in the feature space. This allows the SKR to aggregate the high-dimensional distribution of subclass features in few-shot FGVC tasks, thus expanding the decision boundary. Through these two critical components, we effectively utilize the knowledge from large models to address the issue of data scarcity, resulting in improved performance for fine-grained visual recognition tasks. Extensive experiments demonstrate the consistent performance gain offered by our DRDM.
翻译:细粒度视觉分类的挑战在于如何探索不同子类别之间的细微差异并实现精确判别。以往的研究依赖大规模标注数据和预训练深度模型来实现这一目标。然而,当仅有少量样本可用时,类似方法的有效性会降低。扩散模型因其在数据生成方面的出色多样性而被广泛应用于数据增强。然而,细粒度图像所需的高细节水平使得现有方法难以直接应用。为解决这一问题,我们提出了一种名为细节增强扩散模型(DRDM)的新方法,该方法利用大模型的丰富知识进行细粒度数据增强,包含两个关键组件:判别性语义重组(DSR)和空间知识参考(SKR)。具体而言,DSR旨在从标签中提取隐式相似关系并重建标签与实例之间的语义映射,从而更好地区分不同子类别间的细微差异。此外,我们引入了SKR模块,该模块将不同数据集的分布作为特征空间中的参考。这使得SKR能够在少样本细粒度视觉分类任务中聚合子类别特征的高维分布,从而扩展决策边界。通过这两个关键组件,我们有效利用大模型知识解决了数据稀缺问题,从而提升了细粒度视觉识别任务的性能。大量实验表明,我们提出的DRDM能够持续带来性能提升。