Estimating test accuracy without access to the ground-truth test labels under varying test environments is a challenging, yet extremely important problem in the safe deployment of machine learning algorithms. Existing works rely on the information from either the outputs or the extracted features of neural networks to formulate an estimation score correlating with the ground-truth test accuracy. In this paper, we investigate--both empirically and theoretically--how the information provided by the gradients can be predictive of the ground-truth test accuracy even under a distribution shift. Specifically, we use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data. Our key idea is that the model should be adjusted with a higher magnitude of gradients when it does not generalize to the test dataset with a distribution shift. We provide theoretical insights highlighting the main ingredients of such an approach ensuring its empirical success. Extensive experiments conducted on diverse distribution shifts and model structures demonstrate that our method significantly outperforms state-of-the-art algorithms.
翻译:在变化的测试环境下,无需真实测试标签即可估计测试准确率是一项具有挑战性但对机器学习算法安全部署至关重要的问题。现有方法依赖神经网络输出或提取的特征信息,构建与真实测试准确率相关的估计分数。本文通过理论和实证研究,探讨了即使在分布偏移下,梯度提供的信息如何预测真实测试准确率。具体而言,我们使用分类层梯度的范数,该梯度通过对测试数据仅进行一次梯度步后从交叉熵损失反向传播得到。核心思想是:当模型在存在分布偏移的测试数据集上泛化能力不足时,模型应受到更高梯度幅值的调整。我们从理论层面揭示了该方法成功的关键要素,并确保其经验有效性。针对多种分布偏移和模型结构的大规模实验表明,我们的方法显著优于现有最优算法。