Few-shot learning aims to identify novel categories from only a handful of labeled samples, where prototypes estimated from scarce data are often biased and generalize poorly. Semantic-based methods alleviate this by introducing coarse class-level information, but they are mostly applied on the support side, leaving query representations unchanged. In this paper, we present PMCE, a Probabilistic few-shot framework that leverages Multi-granularity semantics with Caption-guided Enhancement. PMCE constructs a nonparametric knowledge bank that stores visual statistics for each category as well as CLIP-encoded class name embeddings of the base classes. At meta-test time, the most relevant base classes are retrieved based on the similarities of class name embeddings for each novel category. These statistics are then aggregated into category-specific prior information and fused with the support set prototypes via a simple MAP update. Simultaneously, a frozen BLIP captioner provides label-free instance-level image descriptions, and a lightweight enhancer trained on base classes optimizes both support prototypes and query features under an inductive protocol with a consistency regularization to stabilize noisy captions. Experiments on four benchmarks show that PMCE consistently improves over strong baselines, achieving up to 7.71% absolute gain over the strongest semantic competitor on MiniImageNet in the 1-shot setting. Our code is available at https://anonymous.4open.science/r/PMCE-275D
翻译:小样本学习旨在仅利用少量标注样本识别新类别,然而基于稀缺数据估计的原型通常存在偏差且泛化能力较差。基于语义的方法通过引入粗略的类别级信息缓解这一问题,但这类方法大多仅应用于支持集侧,未对查询集表示进行优化。本文提出PMCE——一种基于概率的小样本学习框架,通过多粒度语义与描述引导增强实现性能提升。PMCE构建了一个非参数知识库,其中存储了每个类别的视觉统计量以及基类经过CLIP编码的类名嵌入向量。在元测试阶段,根据每个新类别与基类在类名嵌入空间中的相似度检索最相关的基类,进而将检索到的统计量聚合为类别先验信息,并通过简单的最大后验概率更新与支持集原型进行融合。同时,我们采用冻结的BLIP描述生成器提供无标签的实例级图像描述,并设计一个在基类上训练的轻量级增强器,在归纳式协议下通过一致性正则化优化支持集原型与查询特征,以稳定噪声描述的影响。在四个基准数据集上的实验表明,PMCE持续优于现有强基线方法,在MiniImageNet数据集的1-shot设置中,相比最强的语义方法实现了7.71%的绝对性能提升。代码已开源:https://anonymous.4open.science/r/PMCE-275D