While fine-tuning unleashes the potential of a pre-trained model to a specific task, it trades off the model's generalization capability on out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as an in-distribution (ID) dataset for which the model is being tuned. However, another criterion for reliable machine learning (ML), confidence calibration, has been overlooked despite its increasing demand for real-world high-stakes ML applications (e.g., autonomous driving and medical diagnosis). For the first time, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) under distribution shift by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this, we provide a simple approach, called a calibrated robust fine-tuning (CaRot) that incentivizes the calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.
翻译:虽然微调能够释放预训练模型在特定任务上的潜力,但它会牺牲模型在分布外(OOD)数据集上的泛化能力。为缓解这一问题,鲁棒微调旨在确保模型在OOD数据集以及正在微调的分内(ID)数据集上均能保持性能。然而,机器学习可靠性的另一项标准——置信度校准,尽管在现实世界的高风险机器学习应用(例如自动驾驶和医学诊断)中需求日益增长,却一直被忽视。我们首次提出,通过展示朴素微调甚至最先进的鲁棒微调方法会损害预训练视觉-语言模型(VLM)的校准性能(尤其在OOD数据集上),来引发对分布漂移下微调VLM校准问题的关注。为解决这一问题,我们提出了一种简单方法,称为校准鲁棒微调(CaRot),该方法能够促使模型在ID和OOD数据集上同时实现校准和鲁棒性。在ImageNet-1K分布漂移评估上的实验结果验证了我们方法的有效性。