Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
翻译:少样本学习是一个具有挑战性的问题,因为仅需少量示例即可识别新类别。近年来,一些研究利用额外的语义信息(例如类别名称的文本嵌入)通过结合语义原型与视觉原型来解决样本稀缺问题。然而,这些方法仍受限于从稀有支持样本中学习到的虚假视觉特征,导致收益有限。本文提出了一种新颖的语义提示方法用于少样本学习。与简单利用语义信息来修正分类器不同,我们探索将语义信息作为提示,自适应地调整视觉特征提取网络。具体而言,我们设计了两种互补机制将语义提示插入特征提取器:一种是通过自注意力机制使语义提示与空间维度的补丁嵌入进行交互;另一种是将变换后的语义提示沿通道维度补充到视觉特征中。通过结合这两种机制,特征提取器能更好地关注类别特定特征,并仅凭少量支持样本获得更泛化的图像表示。在四个数据集上的大量实验表明,所提方法取得了令人满意的结果,将1-shot学习准确率平均提高了3.67%。