Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.
翻译:损失函数作为监督学习的基础,通常在模型开发之前被选定。为了避免损失函数可能存在的随意性选择,统计决策理论描述了损失函数的一个理想性质——恰当性,该性质断言贝叶斯规则具有最优性。近期研究致力于联合学习损失函数与模型。现有方法通过拟合一个单调地将$\mathbb{R}$映射到$[0,1]$的逆规范连接函数来估计二分类问题的概率。本文利用凸函数梯度的单调性,将单调性扩展到$\mathbb{R}^{C-1}$与投影概率单形$\tilde{\Delta}^{C-1}$之间的映射。我们提出{\sc LegendreTron}作为一种新颖且实用的方法,能够联合学习多类问题的恰当规范损失与概率。在包含多达1000个类别的基准领域测试中,实验结果表明,在类别数大于10的所有数据集上,我们的方法在99%显著性水平的t检验下,始终优于自然多类基线方法。