In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiability. We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error. Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$. These bounds offer stronger guarantees than Bayes-consistency or $H$-calibration and are more informative than excess error bounds. We begin with binary classification, establishing tight distribution-dependent and -independent bounds. We provide explicit bounds for convex surrogates (including linear models and neural networks) and analyze the adversarial setting for surrogates like $ρ$-margin and sigmoid loss. Extending to multi-class classification, we present the first $H$-consistency bounds for max, sum, and constrained losses, covering both non-adversarial and adversarial scenarios. We demonstrate that in some cases, non-trivial $H$-consistency bounds are unattainable. We also investigate comp-sum losses (e.g., cross-entropy, MAE), deriving their first $H$-consistency bounds and introducing smooth adversarial variants that yield robust learning algorithms. We develop a comprehensive framework for deriving these bounds across various surrogates, introducing new characterizations for constrained and comp-sum losses. Finally, we examine the growth rates of $H$-consistency bounds, establishing a universal square-root growth rate for smooth surrogates in binary and multi-class tasks, and analyze minimizability gaps to guide surrogate selection.
翻译:在机器学习中,由于计算不可行性或缺乏可微性,训练期间优化的损失函数通常与定义任务性能的目标损失不同。我们针对目标损失估计误差相对于替代损失估计误差进行了深入研究。我们的分析导出了$H$-一致性界,这是一种考虑假设集$H$的保证。这些界提供了比贝叶斯一致性或$H$-校准更强的保证,并且比超额误差界更具信息量。我们从二分类问题入手,建立了紧致的分布依赖和分布独立界。我们为凸替代损失(包括线性模型和神经网络)提供了显式界,并分析了$ρ$-间隔损失和Sigmoid损失等替代损失的对抗性设定。扩展到多分类问题,我们首次提出了针对最大值损失、求和损失和约束损失的$H$-一致性界,涵盖了非对抗性和对抗性场景。我们证明在某些情况下,非平凡的$H$-一致性界是无法实现的。我们还研究了复合求和损失(例如交叉熵、平均绝对误差),推导了它们的首个$H$-一致性界,并引入了可产生鲁棒学习算法的平滑对抗性变体。我们开发了一个用于推导各类替代损失下这些界的综合框架,引入了针对约束损失和复合求和损失的新特征描述。最后,我们研究了$H$-一致性界的增长率,为二分类和多分类任务中平滑替代损失建立了普适的平方根增长率,并通过分析最小化差距来指导替代损失的选择。