Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the ``causal transition matrix'', which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.
翻译:在机器学习方法中,噪声标签既是不可避免的,也是存在问题的,因为它们会导致过拟合,从而对模型的泛化能力产生负面影响。在噪声学习背景下,转移矩阵在统计一致性算法的设计中起着至关重要的作用。然而,转移矩阵通常被认为是不可识别的。一类方法通常通过假设转移矩阵是实例无关的来解决这个问题;也就是说,特定实例被错误标记的概率不受其特征或属性的影响。这一假设在复杂的现实场景中显然是不成立的。为了更好地理解转移关系并放宽这一假设,我们提出从因果视角研究噪声标签的数据生成过程。我们发现,一个不可观测的潜变量可以影响实例本身、标签标注过程或两者,这使得转移矩阵的识别变得复杂。为了处理各种场景,我们将这些观察结果统一在一个新的因果图中。在该图中,输入实例根据是否受潜变量影响,被划分为抗噪声成分和噪声敏感成分。这两个成分有助于识别"因果转移矩阵",该矩阵以理论保证逼近真实转移矩阵。基于此,我们设计了一个新颖的训练框架,显式建模这种因果关系,从而获得更准确的推断干净标签的模型。