Pretrained Language Models (PLMs) have advanced Natural Language Processing (NLP) tasks significantly, but finetuning PLMs on low-resource datasets poses significant challenges such as instability and overfitting. Previous methods tackle these issues by finetuning a strategically chosen subnetwork on a downstream task, while keeping the remaining weights fixed to the pretrained weights. However, they rely on a suboptimal criteria for sub-network selection, leading to suboptimal solutions. To address these limitations, we propose a regularization method based on attention-guided weight mixup for finetuning PLMs. Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter, providing finer control over sub-network selection. Furthermore, we employ a bi-level optimization (BLO) based framework on two separate splits of the training dataset, improving generalization and combating overfitting. We validate the efficacy of our proposed method through extensive experiments, demonstrating its superiority over previous methods, particularly in the context of finetuning PLMs on low-resource datasets.
翻译:预训练语言模型(PLMs)显著推动了自然语言处理(NLP)任务的发展,但在低资源数据集上微调PLMs会面临不稳定和过拟合等重大挑战。现有方法通过在目标任务上微调策略性选择的子网络,同时保持其他权重固定为预训练值,从而解决这些问题。然而,这些方法依赖次优的子网络选择标准,导致解决方案不够理想。为克服这些局限,我们提出了一种基于注意力引导权重混合的PLMs微调正则化方法。该方法将每个网络权重表示为任务特定权重与预训练权重的混合,由可学习的注意力参数控制,从而对子网络选择实现更精细的调控。此外,我们采用基于双层优化的框架,在训练数据集的两个独立划分上分别操作,从而提升泛化能力并抑制过拟合。通过大量实验验证了所提方法的有效性,结果表明该方法在低资源数据集上微调PLMs时显著优于现有技术。