Document-level relation extraction (DocRE) is the task of identifying all relations between each entity pair in a document. Evidence, defined as sentences containing clues for the relationship between an entity pair, has been shown to help DocRE systems focus on relevant texts, thus improving relation extraction. However, evidence retrieval (ER) in DocRE faces two major issues: high memory consumption and limited availability of annotations. This work aims at addressing these issues to improve the usage of ER in DocRE. First, we propose DREEAM, a memory-efficient approach that adopts evidence information as the supervisory signal, thereby guiding the attention modules of the DocRE system to assign high weights to evidence. Second, we propose a self-training strategy for DREEAM to learn ER from automatically-generated evidence on massive data without evidence annotations. Experimental results reveal that our approach exhibits state-of-the-art performance on the DocRED benchmark for both DocRE and ER. To the best of our knowledge, DREEAM is the first approach to employ ER self-training.
翻译:文档级关系抽取(DocRE)旨在识别文档中每个实体对之间的所有关系。证据定义为包含实体对关系线索的句子,已被证明确能帮助DocRE系统聚焦相关文本,从而提升关系抽取效果。然而,DocRE中的证据检索(ER)面临两大问题:高内存消耗与标注数据稀缺。本研究旨在解决这些问题以改进ER在DocRE中的应用。首先,我们提出DREEAM——一种内存高效的方法,它将证据信息作为监督信号,引导DocRE系统的注意力模块对证据赋予高权重。其次,我们为DREEAM设计了自训练策略,使其能够在大规模无证据标注数据上从自动生成的证据中学习ER。实验结果表明,我们的方法在DocRED基准测试中针对DocRE和ER均达到当前最优性能。据我们所知,DREEAM是首个采用ER自训练的方法。