Cross-Entropy Loss Functions: Theoretical Analysis and Applications

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of losses, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other loss cross-entropy-like functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps, which only depend on the loss function and the hypothesis set. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.

翻译：交叉熵是应用广泛的一种损失函数。当使用softmax时，它与应用于神经网络输出的逻辑损失一致。然而，当我们将交叉熵作为替代损失使用时，我们能依赖何种保证？本文对一个广泛的损失函数族——comp-sum损失——进行了理论分析，该族包含交叉熵（或逻辑损失）、广义交叉熵、平均绝对误差以及其他类似交叉熵的函数。我们首次给出了这些损失函数的$H$-一致性界。这些非渐近保证通过替代损失的估计误差给出了零一损失估计误差的上界，其针对所使用的特定假设集$H$。我们进一步证明这些界是紧的。这些界依赖于称为最小化间隙的量，该量仅取决于损失函数和假设集。为使其更明确，我们对comp-sum损失的最小化间隙进行了专门分析。我们还引入了一个新的损失函数族——光滑对抗comp-sum损失，它通过添加相关光滑项从对应的comp-sum损失导出。我们通过证明这些损失函数具有$H$-一致性界，展示了它们在对抗场景中的优势。这导致了新的对抗鲁棒性算法，其核心是最小化正则化的光滑对抗comp-sum损失。尽管主要目的在于理论分析，本文也提供了比较comp-sum损失的广泛实证分析。我们进一步报告了一系列实验结果，表明我们的对抗鲁棒性算法优于当前最先进方法，同时实现了更优的非对抗准确率。