Mixture proportion estimation (MPE) aims to estimate class priors from unlabeled data. This task is a critical component in weakly supervised learning, such as PU learning, learning with label noise, and domain adaptation. Existing MPE methods rely on the \textit{irreducibility} assumption or its variant for identifiability. In this paper, we propose novel assumptions based on conditional independence (CI) given the class label, which ensure identifiability even when irreducibility does not hold. We develop method of moments estimators under these assumptions and analyze their asymptotic properties. Furthermore, we present weakly-supervised kernel tests to validate the CI assumptions, which are of independent interest in applications such as causal discovery and fairness evaluation. Empirically, we demonstrate the improved performance of our estimators compared with existing methods and that our tests successfully control both type I and type II errors.\label{key}
翻译:混合比例估计(MPE)旨在从无标签数据中估计类别先验。这一任务是弱监督学习(如PU学习、含标签噪声的学习及领域自适应)中的关键组成部分。现有MPE方法依赖\textit{不可约性}假设或其变体来实现可识别性。本文提出基于给定类别标签的条件独立性(CI)的新假设,该假设即使在不可约性不成立时也能确保可识别性。我们在此假设下开发了矩估计方法并分析了其渐近性质。此外,我们提出了弱监督核检验来验证CI假设,这些检验在因果发现和公平性评估等应用中具有独立价值。实验结果表明,我们的估计量相比现有方法性能提升,且所提检验成功控制了第一类与第二类误差。