During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-training dataset, especially in domains where data is scarce. We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models and explore the performance of targeted noise in addition to the parameters of these models. We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization.
翻译:在自然语言模型的预训练阶段,主要目标是学习预训练数据集的通用表征,通常需要大量文本数据来捕捉自然语言的复杂性和多样性。与之形成对比的是,在大多数情况下,解决特定下游任务时可用的数据量往往远小于上述预训练数据集,尤其是在数据稀缺的领域。我们通过将受控随机性(即噪声)引入训练过程来改进语言模型的微调,并探讨有针对性的噪声与模型参数共同作用下的性能表现。研究发现,在联合命名实体识别与关系提取以及文本摘要这两个下游任务中,添加此类噪声能够提升模型性能。