The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
翻译:当前最先进的预训练语言模型(PLMs)的可复用性常受限于其泛化问题:当模型在不同于训练集样本(即分布外/未见样本)上进行评估时,其性能会显著下降。这一局限源于PLMs对虚假相关性的依赖——这种相关性对高频样本类型有效,但对通用样本表现不佳。为解决该问题,我们提出名为Mask-tuning的训练方法,该方法将掩码语言建模(MLM)训练目标融入微调过程,以增强PLMs的泛化能力。综合实验表明,Mask-tuning不仅超越了当前最先进的技术,还在提升PLMs对分布内数据集性能的同时,增强了其在分布外数据集上的泛化能力。研究结果表明,Mask-tuning可有效提升PLMs在未见数据上的可复用性,使其在实际应用中更具实用性和有效性。