Model calibration usually requires optimizing some parameters (e.g., temperature) w.r.t an objective function (e.g., negative log-likelihood). In this paper, we report a plain, important but often neglected fact that the objective function is influenced by calibration set difficulty, i.e., the ratio of the number of incorrectly classified samples to that of correctly classified samples. If a test set has a drastically different difficulty level from the calibration set, the optimal calibration parameters of the two datasets would be different. In other words, a calibrator optimal on the calibration set would be suboptimal on the OOD test set and thus has degraded performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. Specifically, two calibration functions are trained, one for in-distribution data (low difficulty), and the other for severely OOD data (high difficulty). To achieve desirable calibration on a new OOD dataset, ACE uses an adaptive weighting method that strikes a balance between the two extreme functions. When plugged in, ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks. Importantly, such improvement does not come at the cost of the in-distribution calibration accuracy.
翻译:模型校准通常需要针对目标函数(如负对数似然)优化某些参数(如温度系数)。本文报告了一个简单重要但常被忽视的事实——目标函数受校准集难度(即错误分类样本与正确分类样本的比率)影响。若测试集与校准集具有截然不同的难度水平,两数据集的最优校准参数将存在差异。换言之,在校准集上最优的校准器在分布外测试集上会表现次优,导致性能下降。基于这一认知,我们提出名为自适应校准器集成(ACE)的简单有效方法,用于校准难度通常高于校准集的分布外数据集。具体而言,我们训练两个校准函数:一个面向低难度的分布内数据,另一个面向高难度的严重分布外数据。为在新分布外数据集上实现理想校准,ACE采用自适应加权方法,在两个极端函数间取得平衡。通过即插即用集成,ACE在一系列分布外基准测试中普遍提升了多种先进校准方案的性能。重要的是,这种提升并非以牺牲分布内校准精度为代价。