With the advent of pretrained language models (LMs), increasing research efforts have been focusing on infusing commonsense and domain-specific knowledge to prepare LMs for downstream tasks. These works attempt to leverage knowledge graphs, the de facto standard of symbolic knowledge representation, along with pretrained LMs. While existing approaches have leveraged external knowledge, it remains an open question how to jointly incorporate knowledge graphs representing varying contexts, from local (e.g., sentence), to document-level, to global knowledge, to enable knowledge-rich exchange across these contexts. Such rich contextualization can be especially beneficial for long document understanding tasks since standard pretrained LMs are typically bounded by the input sequence length. In light of these challenges, we propose KALM, a Knowledge-Aware Language Model that jointly leverages knowledge in local, document-level, and global contexts for long document understanding. KALM first encodes long documents and knowledge graphs into the three knowledge-aware context representations. It then processes each context with context-specific layers, followed by a context fusion layer that facilitates knowledge exchange to derive an overarching document representation. Extensive experiments demonstrate that KALM achieves state-of-the-art performance on six long document understanding tasks and datasets. Further analyses reveal that the three knowledge-aware contexts are complementary and they all contribute to model performance, while the importance and information exchange patterns of different contexts vary with respect to different tasks and datasets.
翻译:随着预训练语言模型的兴起,越来越多的研究致力于融入常识知识与领域特定知识,以提升语言模型在下游任务中的表现。这类工作尝试利用知识图谱(符号化知识表示的事实标准)与预训练语言模型相结合。尽管现有方法已利用外部知识,但如何联合整合表征不同上下文(从局部(如句子)到文档级再到全局知识)的知识图谱,以实现跨上下文的富知识交互,仍是一个未解难题。这种丰富的上下文感知能力对长文档理解任务尤为关键,因为标准预训练语言模型通常受限于输入序列长度。针对这些挑战,我们提出KALM——一种知识感知语言模型,能够联合利用局部、文档级和全局上下文中的知识进行长文档理解。KALM首先将长文档与知识图谱编码为三种知识感知的上下文表征,随后通过上下文特定层分别处理每种上下文,并利用上下文融合层促进知识交互,最终生成全局统一的文档表征。大量实验表明,KALM在六项长文档理解任务与数据集上取得了最先进的性能。进一步分析揭示,三种知识感知上下文具有互补性且均有助于模型性能提升,但不同上下文的重要性及其信息交互模式会因任务与数据集的不同而产生差异。