Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels, making noisy labels a more practical alternative. Motivated by noisy multi-class learning, the introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms for noisy multi-label learning. However, estimating multi-label noise transition matrices remains a challenging task, as most existing estimators in noisy multi-class learning rely on anchor points and accurate fitting of noisy class posteriors, which is hard to satisfy in noisy multi-label learning. In this paper, we address this problem by first investigating the identifiability of class-dependent transition matrices in noisy multi-label learning. Building upon the identifiability results, we propose a novel estimator that leverages label correlations without the need for anchor points or precise fitting of noisy class posteriors. Specifically, we first estimate the occurrence probability of two noisy labels to capture noisy label correlations. Subsequently, we employ sample selection techniques to extract information implying clean label correlations, which are then used to estimate the occurrence probability of one noisy label when a certain clean label appears. By exploiting the mismatches in label correlations implied by these occurrence probabilities, we demonstrate that the transition matrix becomes identifiable and can be acquired by solving a bilinear decomposition problem. Theoretically, we establish an estimation error bound for our multi-label transition matrix estimator and derive a generalization error bound for our statistically consistent algorithm. Empirically, we validate the effectiveness of our estimator in estimating multi-label noise transition matrices, leading to excellent classification performance.
翻译:噪声多标签学习因大规模准确标注成本高昂而日益受到关注,噪声标签成为一种更实用的替代方案。受噪声多分类学习的启发,引入转移矩阵有助于建模多标签噪声,并开发噪声多标签学习中统计一致的算法。然而,估计多标签噪声转移矩阵仍是一项挑战性任务,因为大多数现有噪声多分类学习的估计器依赖于锚点及对噪声类后验的精确拟合,这在噪声多标签学习中难以满足。本文首先研究噪声多标签学习中类依赖转移矩阵的可识别性,基于可识别性结果,提出一种无需锚点或精确拟合噪声类后验的新型估计器,该估计器利用标签相关性。具体而言,我们首先估计两个噪声标签的共现概率以捕获噪声标签相关性;随后采用样本选择技术提取蕴含真实标签相关性的信息,并利用这些信息估计特定真实标签出现时某个噪声标签的出现概率。通过利用这些出现概率所隐含的标签相关性差异,我们证明转移矩阵是可识别的,并可通过求解双线性分解问题获取。理论上,我们建立了多标签转移矩阵估计器的估计误差界,并推导了统计一致算法的泛化误差界。实验验证表明,我们的估计器在多标签噪声转移矩阵估计中有效,从而获得优异的分类性能。