Nearly all practical neural models for classification are trained using cross-entropy loss. Yet this ubiquitous choice is supported by little theoretical or empirical evidence. Recent work (Hui & Belkin, 2020) suggests that training using the (rescaled) square loss is often superior in terms of the classification accuracy. In this paper we propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes. We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random initialization. Additionally, in contrast to the square loss, squentropy loss can typically be trained using exactly the same optimization parameters, including the learning rate, as the standard cross-entropy loss, making it a true "plug-and-play" replacement. Finally, unlike the rescaled square loss, multiclass squentropy contains no parameters that need to be adjusted.
翻译:几乎所有实用分类神经网络模型都采用交叉熵损失进行训练。然而,这一普遍选择缺乏充分的理论或实证支撑。近期研究(Hui & Belkin, 2020)表明,使用(重标定的)平方损失训练往往能获得更优的分类精度。本文提出"squentropy"损失,该损失由两项之和构成:交叉熵损失与错误类别上的平均平方损失。我们通过多类别分类问题的大量实验证明,squentropy损失在分类精度上优于纯交叉熵损失和重标定平方损失。实验还表明,与上述两种替代损失相比,该损失能显著提升模型校准性能,并且对随机初始化的方差更小。此外,与平方损失不同,squentropy损失通常可采用与标准交叉熵损失完全相同的优化参数(包括学习率)进行训练,使其成为真正的"即插即用"替换方案。最后,与重标定平方损失不同,多类别squentropy损失无需调节任何参数。