Parameter-efficient fine-tuning (PEFT) techniques have emerged to address overfitting and high computational costs associated with fully fine-tuning in self-supervised learning. Mainstream PEFT methods add a few trainable parameters while keeping the pre-trained backbone parameters fixed. These methods achieve comparative, and often superior, performance to fully fine-tuning, demonstrating the powerful representation ability of the pre-trained backbone. Despite this success, these methods typically ignore the initialization of the new parameters, often relying solely on random initialization. We argue that if pre-training is significantly beneficial, it should be applied to all parameters requiring representational capacity. Motivated by this, we propose Target Parameter Pre-training (TPP), a simple yet effective fine-tuning framework. TPP pre-trains target parameters, i.e., the new parameters introduced during fine-tuning, in an additional stage before PEFT. During this stage, the pre-trained backbone parameters are frozen, and only the new parameters are trainable. A defined pretext task encourages the new parameters to learn specific representations of downstream data. Subsequently, when PEFT is employed, the pre-trained new parameters are loaded to enhance fine-tuning efficiency. The proposed TPP framework is versatile, allowing integration with various pre-trained backbones, pretext tasks, and PEFT methods. We evaluated the fine-tuning performance of our method on seven public datasets, covering four modalities and two task types. The results demonstrate that TPP can be easily integrated into existing PEFT methods, significantly improving performance.
翻译:参数高效微调技术已出现,以解决自监督学习中完全微调所伴随的过拟合和高计算成本问题。主流的PEFT方法在保持预训练主干参数固定的同时,仅添加少量可训练参数。这些方法取得了与完全微调相当甚至更优的性能,证明了预训练主干强大的表征能力。尽管取得了这些成功,这些方法通常忽略新参数的初始化,往往仅依赖随机初始化。我们认为,如果预训练具有显著益处,则应将其应用于所有需要表征能力的参数。受此启发,我们提出了目标参数预训练,这是一种简单而有效的微调框架。TPP在PEFT之前增加一个额外阶段,对目标参数(即微调过程中引入的新参数)进行预训练。在此阶段,预训练的主干参数被冻结,仅新参数可训练。一个定义的前置任务促使新参数学习下游数据的特定表征。随后,当采用PEFT时,加载预训练的新参数以提升微调效率。所提出的TPP框架具有通用性,可与各种预训练主干、前置任务及PEFT方法集成。我们在七个公共数据集上评估了本方法的微调性能,涵盖四种模态和两种任务类型。结果表明,TPP可轻松集成到现有PEFT方法中,显著提升性能。