Prompt tuning and adapter tuning have shown great potential in transferring pre-trained vision-language models (VLMs) to various downstream tasks. In this work, we design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection. Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage. To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen. When updating the mask, we introduce a novel gradient dropout strategy to regularize the parameter selection, in order to prevent the model from forgetting old knowledge and overfitting the downstream data. Experimental results on 11 datasets demonstrate the consistent superiority of our method over previous alternatives. It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters. Furthermore, our method is synergistic with most existing parameter-efficient tuning methods and can boost the performance on top of them. Project page can be found here (https://wuw2019.github.io/RMT/).
翻译:提示微调和适配器微调在将预训练视觉-语言模型(VLM)迁移至各类下游任务中展现出巨大潜力。本研究设计了一种新型微调方法——正则化掩码微调,该方法通过可学习的选择机制对网络参数进行掩码操作。受神经通路启发,我们认为下游任务所需的知识已存在于预训练权重中,仅是在上游预训练阶段被隐藏。为重新挖掘这些有用知识,我们首先识别出对特定下游任务重要的参数集合,随后为每个参数附加二进制掩码,最终在冻结参数的前提下,基于下游数据优化这些掩码。在掩码更新过程中,我们引入了一种新颖的梯度丢弃策略对参数选择进行正则化,从而防止模型遗忘旧知识并避免过拟合下游数据。在11个数据集上的实验结果表明,我们的方法始终优于此前替代方案。值得注意的是,通过仅平均掩码2.56%的参数,我们在零样本CLIP基础上实现了18.73%的性能提升。此外,本方法与现有的大多数参数高效微调方法具有协同性,可进一步提升其性能表现。项目页面见:https://wuw2019.github.io/RMT/。