This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic approach for generating prediction intervals with any ML model. SEMF extends the Expectation-Maximization algorithm, traditionally used in unsupervised learning, to a supervised context, leveraging latent variable modeling for uncertainty estimation. Through extensive empirical evaluation of diverse simulated distributions and 11 real-world tabular datasets, SEMF consistently produces narrower prediction intervals while maintaining the desired coverage probability, outperforming traditional quantile regression methods. Furthermore, without using the quantile (pinball) loss, SEMF allows point predictors, including gradient-boosted trees and neural networks, to be calibrated with conformal quantile regression. The results indicate that SEMF enhances uncertainty quantification under diverse data distributions and is particularly effective for models that otherwise struggle with inherent uncertainty representation.
翻译:本文提出了监督期望最大化框架(SEMF),这是一种通用且与模型无关的方法,可用于为任何机器学习模型生成预测区间。SEMF将传统上用于无监督学习的期望最大化算法扩展至监督学习场景,利用潜变量建模进行不确定性估计。通过对多种模拟分布和11个真实世界表格数据集的广泛实证评估,SEMF在保持所需覆盖概率的同时,持续产生更窄的预测区间,其性能优于传统的分位数回归方法。此外,在不使用分位数(分位点)损失函数的情况下,SEMF允许包括梯度提升树和神经网络在内的点预测器通过保形分位数回归进行校准。结果表明,SEMF能在不同数据分布下增强不确定性量化,对于原本难以表征内在不确定性的模型尤为有效。