The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
翻译:最先进的预训练语言模型(PLMs)的可重用性常受限于其泛化问题,即在评估与训练数据集不同的样本(称为分布外/未见样本)时,模型性能急剧下降。这一局限源于PLMs依赖虚假相关性,这类相关性对常见样本类型有效,但对一般样本无效。为解决此问题,我们提出一种名为掩码微调(Mask-tuning)的训练方法,该方法将掩码语言建模(MLM)训练目标融入微调过程,以增强PLMs的泛化能力。全面实验表明,Mask-tuning超越了当前最先进技术,不仅提升了PLMs在分布外数据集上的泛化能力,还改善了其在分布内数据集上的性能。研究结果表明,Mask-tuning增强了PLMs在未见数据上的可重用性,使其在现实应用中更具实用性和有效性。