Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. Several works exploit semantics and design complicated semantic fusion mechanisms to compensate for rare representative features within restricted data. However, relying on naive semantics such as class names introduces biases due to their brevity, while acquiring extensive semantics from external knowledge takes a huge time and effort. This limitation severely constrains the potential of semantics in few-shot learning. In this paper, we design an automatic way called Semantic Evolution to generate high-quality semantics. The incorporation of high-quality semantics alleviates the need for complex network structures and learning algorithms used in previous works. Hence, we employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes with rich discriminative features for few-shot classification. The experimental results show our framework outperforms all previous methods on six benchmarks, demonstrating a simple network with high-quality semantics can beat intricate multi-modal modules on few-shot classification tasks. Code is available at https://github.com/zhangdoudou123/SemFew.
翻译:从小量数据中学习,即小样本学习,是一项具有挑战性的计算机视觉任务。一些研究利用语义信息并设计复杂的语义融合机制,以补偿有限数据中稀有代表性特征的不足。然而,依赖类名等简单语义会因其简洁性而引入偏差,而从外部知识获取大量语义则需要耗费大量时间和精力。这一局限性严重制约了语义在小样本学习中的潜力。本文设计了一种称为“语义演化”的自动方法,用于生成高质量语义。高质量语义的引入减轻了以往工作中对复杂网络结构和学习算法的需求。因此,我们采用一个简单的两层网络(称为语义对齐网络)将语义和视觉特征转化为鲁棒的类原型,这些原型包含丰富的判别性特征,用于小样本分类。实验结果表明,我们的框架在六个基准数据集上优于所有先前的方法,证明了在少样本分类任务中,带有高质量语义的简单网络可以胜过复杂的多模态模块。代码开源地址:https://github.com/zhangdoudou123/SemFew。