While fine-tuning unlocks the potential of a pre-trained model for a specific task, it compromises the model's ability to generalize to out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as on an in-distribution (ID) dataset for which the model is being tuned. However, another criterion for reliable machine learning (ML), confidence calibration, has been overlooked despite its increasing demand for real-world high-stakes ML applications (e.g., autonomous driving and medical diagnosis). For the first time, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) under distribution shift by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this issue, we provide a simple approach, called calibrated robust fine-tuning (CaRot), that incentivizes calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.
翻译:虽然微调能够发挥预训练模型在特定任务上的潜力,但会损害模型对分布外(OOD)数据集的泛化能力。为缓解这一问题,鲁棒微调旨在确保模型在用于微调的内分布(ID)数据集以及OOD数据集上均保持性能。然而,机器学习(ML)可靠性的另一准则——置信度校准——尽管在现实世界高风险ML应用(如自动驾驶和医学诊断)中需求日益增长,却长期被忽视。我们首次指出,在分布偏移下微调视觉-语言模型(VLM)的校准问题:实验表明,朴素微调甚至当前最先进的鲁棒微调方法都会损害预训练VLM的校准性能,尤其在OOD数据集上尤为显著。针对此问题,我们提出一种名为"校准鲁棒微调(CaRot)"的简洁方法,该方法能同时提升ID和OOD数据集上的校准性与鲁棒性。在ImageNet-1K分布偏移评估上的实验结果验证了本方法的有效性。