Few-shot image classification has received considerable attention for addressing the challenge of poor classification performance with limited samples in novel classes. However, numerous studies have employed sophisticated learning strategies and diversified feature extraction methods to address this issue. In this paper, we propose our method called PrototypeFormer, which aims to significantly advance traditional few-shot image classification approaches by exploring prototype relationships. Specifically, we utilize a transformer architecture to build a prototype extraction module, aiming to extract class representations that are more discriminative for few-shot classification. Additionally, during the model training process, we propose a contrastive learning-based optimization approach to optimize prototype features in few-shot learning scenarios. Despite its simplicity, the method performs remarkably well, with no bells and whistles. We have experimented with our approach on several popular few-shot image classification benchmark datasets, which shows that our method outperforms all current state-of-the-art methods. In particular, our method achieves 97.07% and 90.88% on 5-way 5-shot and 5-way 1-shot tasks of miniImageNet, which surpasses the state-of-the-art results with accuracy of 7.27% and 8.72%, respectively. The code will be released later.
翻译:少样本图像分类因解决新类别中样本有限导致分类性能不佳的挑战而受到广泛关注。然而,许多研究采用了复杂的学习策略和多样化的特征提取方法来解决这一问题。本文提出了一种名为PrototypeFormer的方法,旨在通过探索原型关系,显著推进传统的少样本图像分类方法。具体而言,我们利用Transformer架构构建了一个原型提取模块,以提取对少样本分类更具判别性的类别表征。此外,在模型训练过程中,我们提出了一种基于对比学习的优化方法,用于在少样本学习场景中优化原型特征。尽管方法简单,但其表现卓越,无需任何花哨的额外组件。我们在多个主流的少样本图像分类基准数据集上进行了实验,结果表明我们的方法优于当前所有最先进的方法。特别地,我们的方法在miniImageNet的5-way 5-shot和5-way 1-shot任务上分别达到了97.07%和90.88%的准确率,以7.27%和8.72%的精度优势超越了现有最优结果。代码将在后续发布。