Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples

In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories. However, this task is inherently challenging due to the complex granularity of category subdivisions and the limited availability of data for each category. To address these challenges, this work proposes CSDNet, a pioneering framework that effectively explores contrastive learning and self-distillation to learn discriminative representations specifically designed for Ultra-FGVC tasks. CSDNet comprises three main modules: Subcategory-Specific Discrepancy Parsing (SSDP), Dynamic Discrepancy Learning (DDL), and Subcategory-Specific Discrepancy Transfer (SSDT), which collectively enhance the generalization of deep models across instance, feature, and logit prediction levels. To increase the diversity of training samples, the SSDP module introduces augmented samples from different viewpoints to spotlight subcategory-specific discrepancies. Simultaneously, the proposed DDL module stores historical intermediate features by a dynamic memory queue, which optimizes the feature learning space through iterative contrastive learning. Furthermore, the SSDT module is developed by a novel self-distillation paradigm at the logit prediction level of raw and augmented samples, which effectively distills more subcategory-specific discrepancies knowledge from the inherent structure of limited training data without requiring additional annotations. Experimental results demonstrate that CSDNet outperforms current state-of-the-art Ultra-FGVC methods, emphasizing its powerful efficacy and adaptability in addressing Ultra-FGVC tasks.

翻译：在智能多媒体分析领域，超细粒度视觉分类（Ultra-FGVC）在区分广泛类别中的复杂子类别方面发挥着关键作用。然而，由于类别划分的复杂粒度以及每个类别可用数据的有限性，该任务本身具有固有挑战。为解决这些问题，本文提出CSDNet框架，该开创性框架有效利用对比学习和自蒸馏技术，专门针对Ultra-FGVC任务学习判别性表征。CSDNet包含三个主要模块：子类别特异性差异解析（SSDP）、动态差异学习（DDL）和子类别特异性差异迁移（SSDT），这三个模块在实例、特征和logit预测层面共同增强深度模型的泛化能力。为增加训练样本多样性，SSDP模块引入不同视角的增强样本以突出子类别特异性差异。同时，所提出的DDL模块通过动态记忆队列存储历史中间特征，通过迭代对比学习优化特征学习空间。此外，SSDT模块通过新颖的自蒸馏范式在原始样本和增强样本的logit预测层面进行开发，无需额外标注即可从有限训练数据的固有结构中有效蒸馏出更多子类别特异性差异知识。实验结果表明，CSDNet在性能上超越了当前最先进的Ultra-FGVC方法，凸显了其在解决Ultra-FGVC任务中的强大功效与适应性。