Cross-Entropy Loss Functions: Theoretical Analysis and Applications

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, that are derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.

翻译：交叉熵是应用领域中广泛使用的损失函数。当使用Softmax时，它与应用于神经网络输出的逻辑损失一致。但使用交叉熵作为替代损失时，我们能依赖哪些保证？我们对一类广泛的损失函数（压缩求和损失）进行了理论分析，该类包括交叉熵（或逻辑损失）、广义交叉熵、平均绝对误差及其他类交叉熵损失函数。我们首次给出了这些损失函数的$H$一致性界。这些是非渐近保证，对于所使用的特定假设集$H$，通过将零一损失估计误差以上界形式关联替代损失的估计误差。我们进一步证明这些界是紧的。这些界依赖于称为最小化间隙的量。为使这些量更明确，我们专门分析了压缩求和损失的最小化间隙。我们还引入了一类新的损失函数——平滑对抗压缩求和损失，通过在其压缩求和对应项中添加相关平滑项推导而来。我们证明这些损失函数在对抗性设置中具有优势，因其满足$H$一致性界。这催生了新的对抗鲁棒性算法，该算法通过最小化正则化的平滑对抗压缩求和损失实现。尽管我们的主要目的是理论分析，但我们还进行了广泛的实证分析，比较了各类压缩求和损失。我们进一步报告了一系列实验结果，表明我们的对抗鲁棒性算法在超越当前最先进水平的同时，还实现了更优的非对抗性准确性。