In partial label learning (PLL), every sample is associated with a candidate label set comprising the ground-truth label and several noisy labels. The conventional PLL assumes the noisy labels are randomly generated (instance-independent), while in practical scenarios, the noisy labels are always instance-dependent and are highly related to the sample features, leading to the instance-dependent partial label learning (IDPLL) problem. Instance-dependent noisy label is a double-edged sword. On one side, it may promote model training as the noisy labels can depict the sample to some extent. On the other side, it brings high label ambiguity as the noisy labels are quite undistinguishable from the ground-truth label. To leverage the nuances of IDPLL effectively, for the first time we create class-wise embeddings for each sample, which allow us to explore the relationship of instance-dependent noisy labels, i.e., the class-wise embeddings in the candidate label set should have high similarity, while the class-wise embeddings between the candidate label set and the non-candidate label set should have high dissimilarity. Moreover, to reduce the high label ambiguity, we introduce the concept of class prototypes containing global feature information to disambiguate the candidate label set. Extensive experimental comparisons with twelve methods on six benchmark data sets, including four fine-grained data sets, demonstrate the effectiveness of the proposed method. The code implementation is publicly available at https://github.com/Yangfc-ML/CEL.
翻译:在部分标签学习(PLL)中,每个样本都与一个候选标签集相关联,该集合包含真实标签和若干噪声标签。传统的PLL假设噪声标签是随机生成的(实例无关),而在实际场景中,噪声标签总是实例相关的,并且与样本特征高度关联,这导致了实例依赖性部分标签学习(IDPLL)问题。实例依赖性噪声标签是一把双刃剑:一方面,由于噪声标签能在一定程度上刻画样本特征,它可能促进模型训练;另一方面,由于噪声标签与真实标签难以区分,它带来了较高的标签歧义性。为有效利用IDPLL的细微特性,我们首次为每个样本创建类嵌入,这使得我们能够探究实例依赖性噪声标签之间的关系——即候选标签集中的类嵌入应具有高度相似性,而候选标签集与非候选标签集之间的类嵌入应具有高度相异性。此外,为降低标签歧义性,我们引入包含全局特征信息的类原型概念来消除候选标签集的歧义。在六个基准数据集(包括四个细粒度数据集)上与十二种方法进行的广泛实验对比,证明了所提出方法的有效性。代码实现已在 https://github.com/Yangfc-ML/CEL 公开提供。