Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature focus on addressing this challenge through global-based or local-based representation approaches. The former employs global feature representations for recognization, which may lack fine-grained information. The latter captures local relationships with complex structures, possibly leading to high model complexity. To address the above challenges, this article proposes a novel framework called SGML-Net for few-shot fine-grained visual recognition. SGML-Net incorporates auxiliary information via saliency detection to guide discriminative representation learning, achieving high performance and low model complexity. Specifically, SGML-Net utilizes the saliency detection model to emphasize the key regions of each sub-category, providing a strong prior for representation learning. SGML-Net transfers such prior with two independent branches in a mutual learning paradigm. To achieve effective transfer, SGML-Net leverages the relationships among different regions, making the representation more informative and thus providing better guidance. The auxiliary branch is excluded upon the transfer's completion, ensuring low model complexity in deployment. The proposed approach is empirically evaluated on three widely-used benchmarks, demonstrating its superior performance.
翻译:识别样本稀缺的新子类别是计算机视觉领域一项重要且具挑战性的研究课题。现有文献主要通过基于全局或局部特征的表示方法应对这一挑战。前者采用全局特征表示进行识别,可能缺乏细粒度信息;后者通过复杂结构捕获局部关系,可能导致较高的模型复杂度。为解决上述问题,本文提出一种名为SGML-Net的新型框架,用于小样本细粒度视觉识别。SGML-Net通过显著性检测引入辅助信息,引导判别性表示学习,从而实现高性能与低模型复杂度。具体而言,SGML-Net利用显著性检测模型突出每个子类别的关键区域,为表示学习提供强先验信息。SGML-Net通过互学习范式在两个独立分支间传递该先验。为实现有效传递,SGML-Net利用不同区域间的关系,使表示更具信息量,从而提供更好的引导。传递完成后,辅助分支被移除,确保部署时的低模型复杂度。所提方法在三个广泛使用的基准数据集上进行了实证评估,证明了其优越性能。