Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document. As a typical multi-label classification problem, DocRE faces the challenge of effectively distinguishing a small set of positive relations from the majority of negative ones. This challenge becomes even more difficult to overcome when there exists a significant number of annotation errors in the dataset. In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem. Specifically, we first design an effective loss function to endow high discriminability to both probabilistic outputs and internal representations. We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems. To ameliorate the impact of label errors, we equipped our method with a novel negative label sampling strategy to strengthen the model robustness. In addition, we introduce two new data regimes to mimic more realistic scenarios with annotation errors and evaluate our sampling strategy. Experimental results verify the effectiveness of each component and show that our method achieves new state-of-the-art results on the DocRED dataset, its recently cleaned version, Re-DocRED, and the proposed data regimes.
翻译:文档级关系抽取(DocRE)旨在预测文档中依赖长程上下文推理的实体对之间的关系。作为一个典型的多标签分类问题,DocRE面临有效区分少量正类关系与大量负类关系的挑战。当数据集中存在大量标注错误时,这一挑战变得更加难以克服。本文旨在更好地融合DocRE问题的判别性与鲁棒性。具体而言,我们首先设计了一种有效的损失函数,使概率输出和内部表示均具有高判别性。我们创新性地针对多标签和长尾学习问题定制了熵最小化与监督对比学习。为减轻标签错误的影响,我们提出了一种新颖的负标签采样策略来增强模型鲁棒性。此外,我们引入两种新的数据范式以模拟更真实的含标注错误场景,并评估采样策略。实验验证了各组件的有效性,结果表明我们的方法在DocRED数据集、其最新清洗版本Re-DocRED以及所提出的数据范式上均取得了新的最优结果。