This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic approach for generating prediction intervals in datasets with complete or missing data. SEMF extends the Expectation-Maximization algorithm, traditionally used in unsupervised learning, to a supervised context, leveraging latent variable modeling for uncertainty estimation. Extensive empirical evaluations across 11 tabular datasets show that SEMF often achieves narrower normalized prediction intervals and higher coverage rates than traditional quantile regression methods. Furthermore, SEMF can be integrated with machine learning models like gradient-boosted trees and neural networks, highlighting its practical applicability. The results indicate that SEMF enhances uncertainty quantification, particularly in scenarios with complete data.
翻译:本文提出了监督期望最大化框架(SEMF),这是一种通用且与模型无关的方法,用于在数据完整或缺失的数据集中生成预测区间。SEMF将传统上用于无监督学习的期望最大化算法扩展到监督学习场景,利用隐变量建模进行不确定性估计。在11个表格数据集上进行的大量实证评估表明,与传统的分位数回归方法相比,SEMF通常能实现更窄的归一化预测区间和更高的覆盖率。此外,SEMF可以与梯度提升树和神经网络等机器学习模型集成,突显了其实际适用性。结果表明,SEMF增强了不确定性量化能力,尤其是在数据完整的场景中。