This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.
翻译:本文旨在评估模型在无标签条件下面对分布偏移时的性能表现。近期研究主要关注预测置信度,而本文指出预测分散度是另一重要信息线索。置信度反映单个预测的确定性,分散度则表征整体预测在各类别间的分布情况。我们的核心发现是:性能优异的模型应同时具备高置信度与高分散度的预测特征,即需综合考量这两项属性以实现更精准的估计。为此,我们采用已被证实能有效表征这两类属性的核范数方法。大量实验验证了核范数在不同模型(如ViT和ConvNeXt)、不同数据集(如ImageNet和CUB-200)及多种分布偏移类型(如风格偏移与复制偏移)下的有效性。研究表明,相较于现有方法,核范数在精度估计中具有更高的准确性与鲁棒性。此外,我们验证了其他度量方式(如互信息最大化)表征分散度与置信度的可行性。最后,本文探讨核范数的局限性,研究其在严重类别不平衡条件下的改进变体,并提出潜在研究方向。