Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior (MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Extensive experiments on manually corrupted benchmark datasets and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/idgp.
翻译:部分标签学习(PLL)是一类典型的弱监督学习问题,其中每个训练样本关联一组候选标签,且仅有一个标签为真实标签。现有的大多数PLL方法假设每个训练样本中的错误标签被随机选取为候选标签,并以简单方式建模候选标签的生成过程。然而,由于候选标签的生成过程始终依赖于实例,这些方法通常无法达到预期性能。因此,有必要以更精细的方式对其进行建模。本文考虑实例依赖的PLL,并假设候选标签的生成过程可分解为两个连续阶段:正确标签首先出现在标注者意识中,但由于标注的不确定性,与特征相关的错误标签随后也会与正确标签一同被选为候选标签。基于此考虑,我们提出一种新颖的PLL方法,通过分解式概率分布模型显式建模候选标签的生成过程,并执行最大后验估计(MAP)。在人工噪声标注的基准数据集和真实数据集上的大量实验验证了所提方法的有效性。源代码已开源至 https://github.com/palm-ml/idgp。