The key factor in implementing machine learning algorithms in decision-making situations is not only the accuracy of the model but also its confidence level. The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience. However, these values are known to deviate significantly from the actual expected model confidence. This problem is called model calibration and has been studied extensively. One of the simplest techniques to tackle this task is focal loss, a generalization of cross-entropy by introducing one positive parameter. Although many related studies exist because of the simplicity of the idea and its formalization, the theoretical analysis of its behavior is still insufficient. In this study, our objective is to understand the behavior of focal loss by reinterpreting this function geometrically. Our analysis suggests that focal loss reduces the curvature of the loss surface in training the model. This indicates that curvature may be one of the essential factors in achieving model calibration. We design numerical experiments to support this conjecture to reveal the behavior of focal loss and the relationship between calibration performance and curvature.
翻译:在决策场景中部署机器学习算法的关键因素不仅包括模型的准确性,还涉及它的置信度水平。分类问题中,模型的置信度通常为了方便而由softmax函数的输出向量给出。然而,已知这些值会显著偏离实际预期的模型置信度。这一问题被称为模型校准,并已得到广泛研究。解决这一任务的最简单技术之一是焦点损失,它通过引入一个正参数对交叉熵进行泛化。尽管由于该思路及其形式化的简洁性而存在大量相关研究,但其行为的理论分析仍然不足。在本研究中,我们旨在通过从几何角度重新解释该函数来理解焦点损失的行为。我们的分析表明,焦点损失在训练模型时降低了损失表面的曲率。这表明曲率可能是实现模型校准的关键因素之一。我们设计了数值实验来支持这一猜想,以揭示焦点损失的行为以及校准性能与曲率之间的关系。