We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model "a photo of a [CLASS]", the fil-lin answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its KullbackLeibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade-off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
翻译:我们提出了一种新的微调范式,用于在下游任务上对大规模视觉语言预训练模型进行微调,称为提示正则化(ProReg)。与传统的容易过拟合下游任务数据的微调方法不同,ProReg通过使用预训练模型的提示预测来对微调过程进行正则化。其动机在于:通过提示大型模型“一张[类别]的照片”,填空答案仅依赖于预训练的百科全书式知识,而与通常存在偏差的任务数据分布无关。具体来说,在微调过程中给定一个训练样本的预测,我们首先计算其与提示预测的Kullback-Leibler损失以及真实标签的交叉熵损失,然后通过提出的样本自适应权衡权重将它们结合起来,该权重自动调整预训练域与下游域之间的迁移。在各种分布外基准测试上,我们展示了ProReg相比传统微调、零样本提示、提示调优及其他最先进方法持续强劲的性能表现。