This paper presents a semi-supervised learning framework for Gaussian mixture modelling under a Missing at Random (MAR) mechanism. The method explicitly parameterizes the missingness mechanism by modelling the probability of missingness as a function of classification uncertainty. To quantify classification uncertainty, we introduce margin confidence and incorporate the Aranda Ordaz (AO) link function to flexibly capture the asymmetric relationships between uncertainty and missing probability. Based on this formulation, we develop an efficient Expectation Conditional Maximization (ECM) algorithm that jointly estimates all parameters appearing in both the Gaussian mixture model (GMM) and the missingness mechanism, and subsequently imputes the missing labels by a Bayesian classifier derived from the fitted mixture model. This method effectively alleviates the bias induced by ignoring the missingness mechanism while enhancing the robustness of semi-supervised learning. The resulting uncertainty-aware framework delivers reliable classification performance in realistic MAR scenarios with substantial proportions of missing labels.
翻译:本文提出了一种在随机缺失机制下进行高斯混合建模的半监督学习框架。该方法通过对缺失概率建模为分类不确定性的函数,显式参数化了缺失机制。为量化分类不确定性,我们引入了边缘置信度,并采用Aranda Ordaz链接函数以灵活捕捉不确定性与缺失概率之间的非对称关系。基于此公式化表达,我们开发了一种高效的期望条件最大化算法,该算法联合估计高斯混合模型与缺失机制中出现的所有参数,随后通过从拟合混合模型推导出的贝叶斯分类器对缺失标签进行填补。该方法有效缓解了因忽略缺失机制而产生的偏差,同时增强了半监督学习的鲁棒性。最终形成的不确定性感知框架在具有大量缺失标签的现实随机缺失场景中提供了可靠的分类性能。