It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.
翻译:众所周知,通过以适当评分规则作为损失函数的经验风险最小化,可以训练出准确的概率预测器。尽管这类学习器能够捕捉预测中所谓的偶然不确定性,但近期机器学习领域已发展出多种方法,旨在让学习器同时表征其认知不确定性,即由知识匮乏和数据不足引发的不确定性。文献中新兴分支提出使用二阶学习器,该学习器以概率分布上的分布形式提供预测。然而,近期研究揭示了基于损失最小化的二阶预测器存在严重的理论缺陷。本文对这些发现进行了推广,并证明了一个更基础的结果:似乎不存在任何损失函数能够像适当评分规则激励标准(一阶)学习器那样,为二阶学习器提供忠实表征认知不确定性的激励机制。作为证明该结果的主要数学工具,我们引入了二阶评分规则的广义概念。