Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task. However, a handful of recent efforts explore it across documents or in the cross-document setting (CrossDocRE). This is distinct from the single document case because different documents often focus on disparate themes, while text within a document tends to have a single goal. Linking findings from disparate documents to identify new relationships is at the core of the popular literature-based knowledge discovery paradigm in biomedicine and other domains. Current CrossDocRE efforts do not consider domain knowledge, which are often assumed to be known to the reader when documents are authored. Here, we propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE. Our proposed framework has three main benefits over baselines: 1) it incorporates domain knowledge of entities along with documents' text; 2) it offers interpretability by producing explanatory text for predicted relations between entities 3) it improves performance over the prior methods.
翻译:关系抽取(RE)是一种众所周知的自然语言处理应用,通常被视为句子级或文档级任务。然而,近期少数研究开始探索跨文档或在跨文档设置下的关系抽取(CrossDocRE)。这与单文档情况不同,因为不同文档通常关注不同的主题,而文档内的文本往往具有单一目标。将来自不同文档的发现联系起来以识别新关系,是生物医学及其他领域中流行的基于文献的知识发现范式的核心。当前的CrossDocRE研究未考虑领域知识,而这些知识在文档撰写时通常被假定为读者已知。本文提出了一种新颖的方法KXDocRE,该方法将实体的领域知识与输入文本相结合,用于跨文档关系抽取。与基线方法相比,我们提出的框架具有三个主要优势:1)它结合了实体的领域知识与文档文本;2)通过为预测的实体间关系生成解释性文本,提供了可解释性;3)其性能优于现有方法。