Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, \ie, CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. Our code will be available at https://github.com/jameslahm/SCPNet.
翻译:多标签识别(MLR)中存在不完整标签的情况极具挑战性。近期研究致力于探索视觉语言模型(即CLIP)中的图像-标签对应关系,以弥补标注不足的问题。尽管取得了显著性能,但这些方法普遍忽略了标签-标签对应关系这一宝贵先验。本文提出通过语义先验提示器推导关于标签-标签对应关系的结构化语义先验,以弥补不完整标签下MLR任务中标签监督的缺失。我们进一步提出了一种新颖的语义对应提示网络(SCPNet),能够深入探索该结构化语义先验。同时引入先验增强自监督学习方法,以加强对该先验的利用。在多个广泛使用的基准数据集上进行的综合实验与分析表明,我们的方法在所有数据集上均显著优于现有方法,充分验证了其有效性与优越性。我们的代码将公开于https://github.com/jameslahm/SCPNet。