The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
翻译:现有最先进的预训练语言模型的可复用性常受限于其泛化问题——当模型在不同于训练数据集的样本(即分布外(OOD)/未见样本)上进行评估时,其性能会显著下降。这一局限性源于预训练语言模型对虚假相关性的依赖,这种相关性在常见类型的样本上表现良好,但在一般性样本上效果不佳。为解决该问题,我们提出一种名为掩码微调的训练方法,该方法将掩码语言建模训练目标融入微调过程,以增强预训练语言模型的泛化能力。综合实验表明,掩码微调不仅超越了当前最先进的技术,还能在分布外数据集上提升预训练语言模型的泛化能力,同时改善其在分布内数据集上的性能。研究结果表明,掩码微调提高了预训练语言模型在未见数据上的可复用性,使其在真实世界应用中更加实用且高效。