This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic framework that generates prediction intervals for datasets with complete or missing data. SEMF extends the Expectation-Maximization (EM) algorithm, traditionally used in unsupervised learning, to a supervised context, enabling it to extract latent representations for uncertainty estimation. The framework demonstrates robustness through extensive empirical evaluation across 11 tabular datasets, achieving$\unicode{x2013}$in some cases$\unicode{x2013}$narrower normalized prediction intervals and higher coverage than traditional quantile regression methods. Furthermore, SEMF integrates seamlessly with existing machine learning algorithms, such as gradient-boosted trees and neural networks, exemplifying its usefulness for real-world applications. The experimental results highlight SEMF's potential to advance state-of-the-art techniques in uncertainty quantification.
翻译:本研究提出了监督期望最大化框架(SEMF),这是一个通用且模型无关的框架,可为完整或缺失数据的数据集生成预测区间。SEMF将传统用于无监督学习的期望最大化(EM)算法扩展至监督学习场景,使其能够提取潜在表示以进行不确定性估计。通过对11个表格数据集的广泛实证评估,该框架展现出鲁棒性,在某些情况下获得了比传统分位数回归方法更窄的归一化预测区间和更高的覆盖率。此外,SEMF能够与现有机器学习算法(如梯度提升树和神经网络)无缝集成,体现了其在现实应用中的实用性。实验结果凸显了SEMF在推进不确定性量化领域前沿技术方面的潜力。