Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.
翻译:损失函数在监督分类中扮演核心角色。交叉熵被广泛使用,而平均绝对误差损失虽具有鲁棒性但难以优化。通过在交叉熵与平均绝对误差之间进行插值,近年来提出的广义交叉熵在优化难度与鲁棒性之间提供了平衡。现有广义交叉熵公式在对分类间隔进行优化时会产生非凸问题,易导致欠拟合,在处理复杂数据集时表现欠佳。本文提出一种最小最大化广义交叉熵公式,可实现分类间隔上的凸优化。此外,我们证明最小最大化广义交叉熵能为分类错误率提供上界。所提出的双层凸优化可通过隐式微分计算的随机梯度高效实现。基于基准数据集的实验表明,最小最大化广义交叉熵在准确率、收敛速度及校准性能上表现优异,尤其在存在标签噪声的场景下优势更为显著。