Statistical decision problems lie at the heart of statistical machine learning. The simplest problems are binary and multiclass classification and class probability estimation. Central to their definition is the choice of loss function, which is the means by which the quality of a solution is evaluated. In this paper we systematically develop the theory of loss functions for such problems from a novel perspective whose basic ingredients are convex sets with a particular structure. The loss function is defined as the subgradient of the support function of the convex set. It is consequently automatically proper (calibrated for probability estimation). This perspective provides three novel opportunities. It enables the development of a fundamental relationship between losses and (anti)-norms that appears to have not been noticed before. Second, it enables the development of a calculus of losses induced by the calculus of convex sets which allows the interpolation between different losses, and thus is a potential useful design tool for tailoring losses to particular problems. In doing this we build upon, and considerably extend existing results on $M$-sums of convex sets. Third, the perspective leads to a natural theory of ``polar'' loss functions, which are derived from the polar dual of the convex set defining the loss, and which form a natural universal substitution function for Vovk's aggregating algorithm.
翻译:统计决策问题位于统计机器学习的核心。最简单的问题包括二分类、多分类以及类别概率估计。其核心定义在于损失函数的选择,损失函数是评估解决方案质量的手段。本文从一个新颖的角度系统性地发展了此类问题损失函数的理论,该理论的基本要素是具有特定结构的凸集。损失函数被定义为凸集支撑函数的次梯度,因此它自动是恰当的(即针对概率估计校准)。这一视角提供了三个新机遇。首先,它揭示了损失函数与(反)范数之间一种此前未被注意到的基本关系。其次,它使得利用凸集微积分推导出损失函数的微积分成为可能,从而允许在不同损失函数之间进行插值,这成为针对特定问题定制损失函数的一种潜在有效设计工具。在此过程中,我们基于并显著扩展了凸集M-和的现有结果。第三,该视角引出了“极”损失函数的自然理论,该函数源自定义损失函数的凸集的极对偶,并构成了Vovk聚合算法的一种自然通用替代函数。