While fine-tuning unlocks the potential of a pre-trained model for a specific task, it compromises the model's ability to generalize to out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as on an in-distribution (ID) dataset for which the model is being tuned. However, another criterion for reliable machine learning (ML), confidence calibration, has been overlooked despite its increasing demand for real-world high-stakes ML applications (e.g., autonomous driving and medical diagnosis). For the first time, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) under distribution shift by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this issue, we provide a simple approach, called calibrated robust fine-tuning (CaRot), that incentivizes calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.
翻译:虽然微调能够挖掘预训练模型在特定任务上的潜力,但也会损害模型在分布外数据集上的泛化能力。为缓解这一问题,稳健微调旨在确保模型在分布外数据集以及为其调整的分布内数据集上均能保持性能。然而,机器学习可靠性的另一项准则——置信度标定——尽管在自动驾驶和医学诊断等现实高风险机器学习应用中的需求日益增长,却一直被忽视。我们首次指出,在分布偏移下微调视觉-语言模型会引发标定问题:即使是最先进的稳健微调方法,也会损害预训练视觉-语言模型的标定效果,尤其是在分布外数据集上。为解决这一问题,我们提出一种名为标定稳健微调的简单方法,该方法能在分布内和分布外数据集上同时提升标定性与稳健性。基于ImageNet-1K分布偏移评估的实验结果验证了本方法的有效性。