A fundamental challenge in machine learning is the choice of a loss as it characterizes our learning task, is minimized in the training phase, and serves as an evaluation criterion for estimators. Proper losses are commonly chosen, ensuring minimizers of the full risk match the true probability vector. Estimators induced from a proper loss are widely used to construct forecasters for downstream tasks such as classification and ranking. In this procedure, how does the forecaster based on the obtained estimator perform well under a given downstream task? This question is substantially relevant to the behavior of the $p$-norm between the estimated and true probability vectors when the estimator is updated. In the proper loss framework, the suboptimality of the estimated probability vector from the true probability vector is measured by a surrogate regret. First, we analyze a surrogate regret and show that the strict properness of a loss is necessary and sufficient to establish a non-vacuous surrogate regret bound. Second, we solve an important open question that the order of convergence in p-norm cannot be faster than the $1/2$-order of surrogate regrets for a broad class of strictly proper losses. This implies that strongly proper losses entail the optimal convergence rate.
翻译:机器学习中的一个基本挑战是损失函数的选择,因为它刻画了我们的学习任务,在训练阶段被最小化,并作为估计器的评估准则。通常选择适当损失函数,以确保完整风险的最小化器与真实概率向量相匹配。由适当损失函数导出的估计器被广泛用于构建下游任务(如分类和排序)的预测器。在此过程中,基于所得估计器构建的预测器如何在给定下游任务下表现良好?这个问题与估计器更新时估计概率向量和真实概率向量之间的$p$范数行为密切相关。在适当损失函数框架下,估计概率向量相对于真实概率向量的次优性通过代理遗憾来衡量。首先,我们分析代理遗憾并证明损失函数的严格适当性是建立非平凡代理遗憾界的充要条件。其次,我们解决了一个重要的开放性问题:对于一大类严格适当损失函数,$p$范数中的收敛阶数不可能快于代理遗憾的$1/2$阶。这意味着强适当损失函数必然具有最优收敛速率。