Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, \ie, CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. Our code will be available at https://github.com/jameslahm/SCPNet.
翻译:不完整标签下的多标签识别(MLR)极具挑战性。近期研究致力于探索视觉-语言模型(即CLIP)中的图像-标签对应关系,以补偿标注不足的问题。尽管取得了令人瞩目的性能,但这些方法普遍忽略了关于标签间对应关系的宝贵先验知识。本文主张通过语义先验提示器推导出关于标签间对应关系的结构化语义先验,从而弥补不完整标签下MLR的标签监督缺失。我们进一步提出了一种新颖的语义对应提示网络(SCPNet),该网络能够深入探索结构化语义先验。同时引入了一种先验增强的自监督学习方法,以强化该先验的利用。在多个广泛使用的基准数据集上进行的全面实验与分析表明,我们的方法在所有数据集上均显著优于现有方法,充分验证了其有效性与优越性。我们的代码将开源至 https://github.com/jameslahm/SCPNet。