In the context of noisy partial label learning (NPLL), each training sample is associated with a set of candidate labels annotated by multiple noisy annotators. With the emergence of high-performance pre-trained vision-language models (VLMs) such as CLIP, LLaVA, and GPT-4V, leveraging these models to replace time-consuming manual annotation and enable annotation-free training has become a promising research direction. This paper studies learning from noisy partial labels generated by pre-trained VLMs and proposes a collaborative consistency regularization (Co-Reg) framework. Unlike symmetric noise commonly assumed in traditional noisy label learning, VLM-generated noise is instance-dependent and reflects the intrinsic biases of pre-trained models, posing greater challenges. To address this issue, we jointly train two neural networks to perform collaborative label purification via a co-pseudo-labeling mechanism, while enforcing consistency regularization in both label and feature representation spaces. In addition, multiple anti-overfitting strategies are introduced, including alternating optimization of contrastive representations and pseudo-labels, as well as maintaining class prototypes in a shared feature space. The proposed method can further incorporate few-shot manually annotated labels for performance enhancement. Extensive experiments under various settings demonstrate the effectiveness of our approach and highlight the potential of integrating weakly supervised learning into the knowledge distillation of pre-trained models.
翻译:在噪声部分标签学习(NPLL)中,每个训练样本都与一组由多个噪声标注者标注的候选标签相关联。随着CLIP、LLaVA、GPT-4V等高性能预训练视觉语言模型(VLM)的出现,利用这些模型替代耗时的人工标注并实现免标注训练已成为一个前景广阔的研究方向。本文研究如何从预训练VLM生成的噪声部分标签中学习,并提出了一种协同一致性正则化(Co-Reg)框架。与传统噪声标签学习中通常假设的对称噪声不同,VLM生成的噪声是实例依赖的,反映了预训练模型的内在偏差,带来了更大的挑战。为解决此问题,我们联合训练两个神经网络,通过协同伪标签机制进行协同标签净化,同时在标签和特征表示空间实施一致性正则化。此外,引入了多种抗过拟合策略,包括对比表示与伪标签的交替优化,以及在共享特征空间中维护类别原型。所提方法可进一步结合少量人工标注标签以提升性能。多种设置下的广泛实验证明了我们方法的有效性,并凸显了将弱监督学习整合到预训练模型知识蒸馏中的潜力。