In many classification problems, misclassification costs are highly asymmetric, while training labels are often corrupted due to measurement error, annotator variability, or adversarial noise. The Neyman-Pearson multiclass classification (NPMC) framework addresses such asymmetry by controlling class-specific errors, but existing methods assume that training labels are correctly observed. To our knowledge, no existing approach handles NPMC under label noise in the multiclass setting, and the only binary method requires prior knowledge of the noise mechanism. A fundamental difficulty is that, without structural assumptions, noisy-label models are non-identifiable: distinct combinations of class-conditional distributions and noise mechanisms can induce the same observed distribution, preventing recovery of the quantities required for error control. We show that the exponential tilting density ratio model restores identifiability, and leverage this structure to develop an empirical likelihood approach for NPMC with noisy labels. The proposed method jointly estimates clean-label class proportions, posterior probabilities, and the noise mechanism from noisy data, without requiring prior knowledge of the confusion matrix. An expectation-maximization algorithm enables efficient computation. The resulting estimators are root n consistent and asymptotically normal, and the induced classifiers satisfy Neyman-Pearson oracle inequalities in both binary and multiclass settings. Simulation and real-data experiments demonstrate near-oracle performance.
翻译:在许多分类问题中,误分类成本高度不对称,而训练标签常因测量误差、标注者变异性或对抗性噪声而受到污染。Neyman-Pearson多类分类(NPMC)框架通过控制特定类别的误差来解决此类不对称性,但现有方法均假设训练标签被正确观测。据我们所知,尚无现有方法能在多类场景下处理带标签噪声的NPMC问题,且唯一的二分类方法需要预先了解噪声机制。一个根本性难题在于,若无结构假设,含噪标签模型不可识别:不同的类条件分布与噪声机制组合可能产生相同的观测分布,从而阻碍恢复误差控制所需的量。我们证明指数倾斜密度比模型可恢复可识别性,并利用这一结构为含噪标签的NPMC开发了一种经验似然方法。所提方法无需预先知道混淆矩阵,即可从含噪数据中联合估计干净标签的类比例、后验概率及噪声机制。期望最大化算法实现了高效计算。所得估计量具有根号n一致性和渐近正态性,且所诱导的分类器在二分类与多类场景下均满足Neyman-Pearson oracle不等式。仿真与真实数据实验展示了接近oracle的性能。