The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.
翻译:在机器学习中,准确表征认知不确定性是一项具有挑战性但至关重要的任务。一种广泛使用的表征对应于概率预测器的凸集合,也称为置信集。构建这些置信集的一种流行方法是通过集成或专门的监督学习方法,其中认知不确定性可以通过集合大小或成员间分歧等度量来量化。原则上,这些集合应包含真实的数据生成分布。作为此有效性的必要条件,我们采用最强的校准概念作为代理。具体而言,我们提出了一种新颖的统计检验方法,用于确定是否存在集合预测的凸组合在分布上是校准的。与先前方法相比,我们的框架允许凸组合依赖于实例,认识到不同的集成成员可能在输入空间的不同区域具有更好的校准性。此外,我们通过适当的评分规则学习这种组合,这些规则本质上优化了校准性。基于可微分的、基于核的校准误差估计器,我们引入了一种非参数检验程序,并在合成和真实世界实验中展示了捕捉实例级变异性的优势。