In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories. However, this task is inherently challenging due to the complex granularity of category subdivisions and the limited availability of data for each category. To address these challenges, this work proposes CSDNet, a pioneering framework that effectively explores contrastive learning and self-distillation to learn discriminative representations specifically designed for Ultra-FGVC tasks. CSDNet comprises three main modules: Subcategory-Specific Discrepancy Parsing (SSDP), Dynamic Discrepancy Learning (DDL), and Subcategory-Specific Discrepancy Transfer (SSDT), which collectively enhance the generalization of deep models across instance, feature, and logit prediction levels. To increase the diversity of training samples, the SSDP module introduces adaptive augmented samples to spotlight subcategory-specific discrepancies. Simultaneously, the proposed DDL module stores historical intermediate features by a dynamic memory queue, which optimizes the feature learning space through iterative contrastive learning. Furthermore, the SSDT module effectively distills subcategory-specific discrepancies knowledge from the inherent structure of limited training data using a self-distillation paradigm at the logit prediction level. Experimental results demonstrate that CSDNet outperforms current state-of-the-art Ultra-FGVC methods, emphasizing its powerful efficacy and adaptability in addressing Ultra-FGVC tasks.
翻译:在智能多媒体分析领域,超细粒度视觉分类(Ultra-FGVC)在区分更广泛类别中的复杂子类别方面发挥着关键作用。然而,由于类别划分的细粒度复杂性以及每个类别可用数据的有限性,该任务本质上具有挑战性。为解决这些问题,本文提出CSDNet——一个开创性框架,有效探索对比学习和自蒸馏方法,以学习专门针对Ultra-FGVC任务的判别性表征。CSDNet包含三个主要模块:子类别特异性差异解析(SSDP)、动态差异学习(DDL)和子类别特异性差异迁移(SSDT),这些模块共同增强深度模型在实例、特征和逻辑预测层面的泛化能力。为增加训练样本的多样性,SSDP模块引入自适应增强样本以突出子类别特异性差异。同时,所提出的DDL模块通过动态记忆队列存储历史中间特征,并通过迭代对比学习优化特征学习空间。此外,SSDT模块在逻辑预测层面利用自蒸馏范式,从有限训练数据的固有结构中有效蒸馏子类别特异性差异知识。实验结果表明,CSDNet在性能上超越当前最先进的Ultra-FGVC方法,充分证明了其在处理Ultra-FGVC任务中的强大效能与适应性。