Foundation models pretrained on large-scale natural images have been widely used to adapt to medical image analysis through finetuning. This is largely attributed to pretrained representations capturing universal, robust, and generalizable features, which can be reutilized by downstream tasks. However, these representations are later found to gradually vanish during finetuning, accompanied by a degradation of foundation model's original abilities, e.g., generalizability. In this paper, we argue that pretrained representations can be well preserved while still effectively adapting to downstream tasks. We study this by proposing a new finetuning method RepSim, which minimizes the distance between pretrained and finetuned representations via constraining learnable orthogonal manifold based on similarity invariance. Compared to standard finetuning methods, e.g., full finetuning, our method improves representation similarity by over 30% while maintaining competitive accuracy, and reduces sharpness by 42% across five medical image classification datasets. The code will be released.
翻译:在大规模自然图像上预训练的基础模型已广泛用于通过微调来适应医学图像分析任务。这主要归因于预训练表征能够捕获通用、鲁棒且可泛化的特征,这些特征可被下游任务重复利用。然而,研究发现这些表征在微调过程中会逐渐消失,同时伴随着基础模型原有能力(例如泛化性)的退化。本文认为,预训练表征可以在有效适应下游任务的同时得到良好保持。我们通过提出一种新的微调方法 RepSim 来研究这一问题,该方法基于相似性不变性约束可学习的正交流形,从而最小化预训练表征与微调后表征之间的距离。与标准微调方法(例如全参数微调)相比,我们的方法在保持竞争力的准确率的同时,将表征相似性提高了超过30%,并在五个医学图像分类数据集上将锐度降低了42%。代码将公开。