Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for recognizing new classes. Many methods extend upon this solution, and recent ones are especially keen on extracting rich features from images, e.g. attribute features. These attribute features are normally extracted within each individual image; however, the common traits for features across images yet belonging to the same attribute are not emphasized. In this paper, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images. Besides the novel architecture, two elements are highlighted for attribute representations: a new prototype generation module is designed to generate attribute prototypes from attribute semantics; a hard example-based contrastive optimization scheme is introduced to reinforce attribute-level features in the embedding space. We explore two alternative backbones, CNN-based and transformer-based, to build our framework and conduct experiments on three standard benchmarks, CUB, SUN, AwA2. Results on these benchmarks demonstrate that our method improves the state of the art by a considerable margin. Our codes will be available at https://github.com/dyabel/CoAR-ZSL.git
翻译:零样本学习旨在识别训练集中无样本的类别。一种典型解决方案是直接学习一个嵌入函数,将视觉特征与对应类别语义关联以识别新类别。许多方法在此方案上进行了拓展,近期方法尤其注重从图像中提取丰富特征(如属性特征)。这些属性特征通常在单张图像内提取,但跨图像中属于同一属性的共同特征未得到强调。本文提出一个新框架,通过显式学习超越图像层面的属性原型,并将其与图像内的属性级特征进行对比优化来提升零样本学习性能。除新型架构外,属性表征的两个关键要素被突出:设计了一个新原型生成模块,用于从属性语义中生成属性原型;引入基于难样本的对比优化方案,以强化嵌入空间中的属性级特征。我们探索了基于CNN和Transformer的两种骨干网络来构建框架,并在CUB、SUN、AwA2三个标准基准上开展实验。结果表明,我们的方法以较大幅度提升了现有最佳性能。代码将开源至https://github.com/dyabel/CoAR-ZSL.git