Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in Kakade-Foster (2008), B{\l}asiok et al. (2023). Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal.
翻译:优化恰当损失函数通常被认为能产生具有良好校准特性的预测器;其直觉在于,对于此类损失函数,全局最优解是预测真实概率,这确实是校准的。然而,典型的机器学习模型是在受限的预测器族上近似最小化损失,而该族不太可能包含真实值。在何种条件下,在受限族上优化恰当损失能产生校准模型?它又能提供何种精确的校准保证?在本文中,我们为这些问题提供了严格解答。我们使用局部最优性条件取代全局最优性,该条件规定预测器的(恰当)损失无法通过用某个李普希茨函数族对其预测进行后处理而大幅降低。我们证明,任何满足该局部最优性的预测器都符合Kakade-Foster(2008)和Błasiok等人(2023)定义的平滑校准。训练良好的深度神经网络很可能满足局部最优性,这解释了为何仅通过恰当损失最小化就能使其实现校准。最后,我们证明局部最优性与校准误差之间存在双向联系:接近校准的预测器也接近局部最优。