图像分类中数据集偏移下的校准现状研究 (Where are we with calibration under dataset shift in image classification?)

from arxiv, Code available at https://github.com/biomedia-mira/calibration_under_shifts. Published in TMLR, October 2025 (https://openreview.net/forum?id=1NYKXlRU2H)

We conduct an extensive study on the state of calibration under real-world dataset shift for image classification. Our work provides important insights on the choice of post-hoc and in-training calibration techniques, and yields practical guidelines for all practitioners interested in robust calibration under shift. We compare various post-hoc calibration methods, and their interactions with common in-training calibration strategies (e.g., label smoothing), across a wide range of natural shifts, on eight different classification tasks across several imaging domains. We find that: (i) simultaneously applying entropy regularisation and label smoothing yield the best calibrated raw probabilities under dataset shift, (ii) post-hoc calibrators exposed to a small amount of semantic out-of-distribution data (unrelated to the task) are most robust under shift, (iii) recent calibration methods specifically aimed at increasing calibration under shifts do not necessarily offer significant improvements over simpler post-hoc calibration methods, (iv) improving calibration under shifts often comes at the cost of worsening in-distribution calibration. Importantly, these findings hold for randomly initialised classifiers, as well as for those finetuned from foundation models, the latter being consistently better calibrated compared to models trained from scratch. Finally, we conduct an in-depth analysis of ensembling effects, finding that (i) applying calibration prior to ensembling (instead of after) is more effective for calibration under shifts, (ii) for ensembles, OOD exposure deteriorates the ID-shifted calibration trade-off, (iii) ensembling remains one of the most effective methods to improve calibration robustness and, combined with finetuning from foundation models, yields best calibration results overall.

翻译：本文对图像分类任务在真实世界数据集偏移下的校准现状进行了广泛研究。我们的工作为后处理和训练中校准技术的选择提供了重要见解，并为所有关注偏移下鲁棒校准的研究者提供了实用指南。我们在多个成像领域的八种不同分类任务上，针对各种自然偏移场景，比较了多种后处理校准方法及其与常见训练中校准策略（如标签平滑）的交互作用。研究发现：（i）同时应用熵正则化与标签平滑能在数据集偏移下产生最佳校准的原始概率；（ii）接触少量语义分布外数据（与任务无关）的后处理校准器在偏移下最具鲁棒性；（iii）近期专门针对提升偏移下校准性能的方法相比简单的后处理校准方法未必能带来显著改进；（iv）改善偏移下的校准性能常以牺牲分布内校准效果为代价。重要的是，这些结论同时适用于随机初始化的分类器以及基于基础模型微调的分类器，后者相比从头训练的模型始终表现出更好的校准特性。最后，我们对集成效应进行了深入分析，发现：（i）在集成前（而非集成后）应用校准对偏移下的校准更有效；（ii）对于集成模型，分布外数据接触会恶化分布内-偏移校准的权衡关系；（iii）集成仍然是提升校准鲁棒性的最有效方法之一，与基于基础模型的微调相结合可产生整体最优的校准结果。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

AAAI 2025 | 基于信息瓶颈准则的联邦图数据压缩

专知会员服务

11+阅读 · 2025年1月20日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日