Decomposing prediction uncertainty into aleatoric (irreducible) and epistemic (reducible) components is critical for the reliable deployment of machine learning systems. While the mutual information between the response variable and model parameters is a principled measure for epistemic uncertainty, it requires access to the parameter posterior, which is computationally challenging to approximate. Consequently, practitioners often rely on probabilistic predictions from deep ensembles to quantify uncertainty, which have demonstrated strong empirical performance. However, a theoretical understanding of their success from a frequentist perspective remains limited. We address this gap by first considering a bootstrap-based estimator for epistemic uncertainty, which we prove is asymptotically correct. Next, we connect deep ensembles to the bootstrap estimator by decomposing it into data variability and training stochasticity; specifically, we show that deep ensembles capture the training stochasticity component. Through empirical studies, we show that this stochasticity component constitutes the majority of epistemic uncertainty, thereby explaining the effectiveness of deep ensembles.
翻译:将预测不确定性分解为偶然性(不可约)和认知性(可约)成分对于机器学习系统的可靠部署至关重要。虽然响应变量与模型参数之间的互信息是衡量认知不确定性的一个原则性指标,但它需要获取参数后验分布,而近似计算该分布在计算上具有挑战性。因此,从业者通常依赖深度集成模型的概率预测来量化不确定性,这些方法已展现出强大的实证性能。然而,从频率主义视角对其成功进行理论理解仍然有限。我们通过首先考虑一种基于自助法的认知不确定性估计量来填补这一空白,并证明该估计量是渐近正确的。接着,我们将深度集成与自助法估计量联系起来,通过将其分解为数据变异性和训练随机性;具体而言,我们证明深度集成捕获了训练随机性成分。通过实证研究,我们表明该随机性成分构成了认知不确定性的主要部分,从而解释了深度集成的有效性。