Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR. We find that strong semantic correlations exist within each image and across different images, and these correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels and thus improve the performance of the MLR-PL task (see Figure 1). In this work, we propose a novel heterogeneous semantic transfer (HST) framework that consists of two complementary transfer modules that explore both within-image and cross-image semantic correlations to transfer the knowledge possessed by known labels to generate pseudo labels for the unknown labels. Specifically, an intra-image semantic transfer (IST) module learns an image-specific label co-occurrence matrix for each image and maps the known labels to complement the unknown labels based on these matrices. Additionally, a cross-image transfer (CST) module learns category-specific feature-prototype similarities and then helps complement the unknown labels that have high degrees of similarity with the corresponding prototypes. Finally, both the known and generated pseudo labels are used to train MLR models. Extensive experiments conducted on the Microsoft COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed HST framework achieves superior performance to that of current state-of-the-art algorithms. Specifically, it obtains mean average precision (mAP) improvements of 1.4%, 3.3%, and 0.4% on the three datasets over the results of the best-performing previously developed algorithm.

翻译：部分标签的多标签图像识别（MLR-PL）任务中，每张图像仅标注部分已知标签，其余标签未知。该任务能大幅降低标注成本，从而推动大规模多标签识别的发展。研究发现，图像内部及跨图像之间存在强烈的语义关联，这些关联可帮助将已知标签的知识迁移至未知标签的检索过程，从而提升MLR-PL任务性能（见图1）。本文提出了一种新型异构语义迁移（HST）框架，该框架包含两个互补的迁移模块，分别探索图像内与跨图像的语义关联，将已知标签的知识迁移至未知标签，生成其伪标签。具体而言，图像内语义迁移（IST）模块学习每张图像独有的标签共现矩阵，并基于这些矩阵将已知标签映射至未知标签；跨图像迁移（CST）模块则学习类别特定的特征-原型相似度，辅助生成与对应原型高度相似的未知标签伪标签。最终，已知标签与生成的伪标签共同用于训练MLR模型。在Microsoft COCO、Visual Genome和Pascal VOC 2007数据集上的大量实验表明，所提HST框架的性能优于当前最先进算法。具体而言，该框架在三个数据集上的平均精度均值（mAP）分别较最优基线算法提升1.4%、3.3%和0.4%。