This paper explores connections between margin-based loss functions and consistency in binary classification and regression applications. It is shown that a large class of margin-based loss functions for binary classification/regression result in estimating scores equivalent to log-likelihood scores weighted by an even function. A simple characterization for conformable (consistent) loss functions is given, which allows for straightforward comparison of different losses, including exponential loss, logistic loss, and others. The characterization is used to construct a new Huber-type loss function for the logistic model. A simple relation between the margin and standardized logistic regression residuals is derived, demonstrating that all margin-based loss can be viewed as loss functions of squared standardized logistic regression residuals. The relation provides new, straightforward interpretations for exponential and logistic loss, and aids in understanding why exponential loss is sensitive to outliers. In particular, it is shown that minimizing empirical exponential loss is equivalent to minimizing the sum of squared standardized logistic regression residuals. The relation also provides new insight into the AdaBoost algorithm.
翻译:本文探讨了基于边界的损失函数与二分类及回归应用中一致性之间的关联。研究表明,二分类/回归中一大类基于边界的损失函数实际上等价于由偶函数加权的对数似然分数估计。本文给出了相容(一致)损失函数的简洁特征描述,使得不同损失函数(包括指数损失、逻辑损失等)的比较更为直接。利用该特征描述,本文为逻辑模型构建了一种新型Huber型损失函数。推导得出边界与标准化逻辑回归残差之间的简单关系,证明所有基于边界的损失均可视为标准化逻辑回归残差平方的损失函数。这一关系为指数损失和逻辑损失提供了全新的直观解释,并有助于理解指数损失为何对异常值敏感。特别地,研究表明最小化经验指数损失等价于最小化标准化逻辑回归残差平方和。该关系还为AdaBoost算法提供了新的见解。