While fine-tuning of pre-trained language models generally helps to overcome the lack of labelled training samples, it also displays model performance instability. This instability mainly originates from randomness in initialisation or data shuffling. To address this, researchers either modify the training process or augment the available samples, which typically results in increased computational costs. We propose a new mitigation strategy, called Delayed Ensemble with Noisy Interpolation (DENI), that leverages the strengths of ensembling, noise regularisation and model interpolation, while retaining computational efficiency. We compare DENI with 9 representative mitigation strategies across 3 models, 4 tuning strategies and 7 text classification datasets. We show that: 1) DENI outperforms the best performing mitigation strategy (Ensemble), while using only a fraction of its cost; 2) the mitigation strategies are beneficial for parameter-efficient fine-tuning (PEFT) methods, outperforming full fine-tuning in specific cases; and 3) combining DENI with data augmentation often leads to even more effective instability mitigation.
翻译:尽管预训练语言模型的微调通常有助于克服标注训练样本不足的问题,但其也表现出模型性能的不稳定性。这种不稳定性主要源于初始化或数据洗牌的随机性。为解决此问题,研究者们要么修改训练过程,要么增加可用样本,但这通常会导致计算成本增加。我们提出了一种新的缓解策略,称为延迟集成与噪声插值(Delayed Ensemble with Noisy Interpolation, DENI),它结合了集成学习、噪声正则化和模型插值的优势,同时保持了计算效率。我们在3种模型、4种微调策略和7个文本分类数据集上,将DENI与9种代表性缓解策略进行了比较。结果表明:1)DENI在仅使用最佳缓解策略(集成学习)一小部分成本的情况下,性能优于后者;2)这些缓解策略对参数高效微调(PEFT)方法有益,在特定情况下其表现优于全参数微调;3)将DENI与数据增强结合通常能带来更有效的不稳定性缓解。