Despite serving as the foundation models for a wide range of NLP benchmarks, pre-trained language models have shown limited capabilities of acquiring implicit commonsense knowledge from self-supervision alone, compared to learning linguistic and factual knowledge that appear more explicitly in the surface patterns in text. In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model and then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction, which align human language with the underlying commonsense knowledge. Empirical results show that our approach consistently improves the model's performance on downstream tasks that require commonsense reasoning. Moreover, we find that the improvement is more significant in the few-shot setting. This suggests that our approach helps language models better transfer to downstream tasks without extensive supervision by injecting commonsense knowledge into their parameters.
翻译:尽管预训练语言模型作为众多自然语言处理基准任务的基础模型,但其仅通过自监督方式获取隐含常识知识的能力有限,相比之下,它们更擅长学习文本表层模式中显式出现的语言知识和事实知识。本研究提出了常识知识迁移框架,该框架将存储于神经常识知识模型中的常识知识迁移至通用预训练语言模型。首先利用通用文本生成查询,从神经常识知识模型中提取常识知识,随后通过两个自监督目标——常识掩码填充与常识关系预测——对语言模型进行精调,使人类语言与潜在常识知识对齐。实验结果表明,我们的方法能持续提升模型在需要常识推理的下游任务上的表现。此外,我们发现该方法在少样本场景下提升更为显著。这表明,通过将常识知识注入模型参数,我们的方法有助于语言模型在无需大量监督的情况下更好地迁移至下游任务。