The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document-level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-unlabeled metric learning framework (P3M). Specifically, we formulate document-level RE as a metric learning problem. We aim to pull the distance closer between entity pair embedding and their corresponding relation embedding, while pushing it farther away from the none-class relation embedding. Additionally, we adapt the positive-unlabeled learning to this loss objective. In order to improve the generalizability of the model, we use dropout to augment positive samples and propose a positive-none-class mixup method. Extensive experiments show that P3M improves the F1 score by approximately 4-10 points in document-level RE with incomplete labeling, and achieves state-of-the-art results in fully labeled scenarios. Furthermore, P3M has also demonstrated robustness to prior estimation bias in incomplete labeled scenarios.
翻译:文档级关系抽取(RE)的目标是识别跨多个句子的实体对之间的关系。近年来,文档级RE中的不完整标注问题日益受到关注,部分研究采用正-无标签学习等方法应对此挑战,但仍存在较大改进空间。为此,我们提出一种基于正样本增强与正样本混合的正-无标签度量学习框架(P3M)。具体而言,我们将文档级RE建模为度量学习问题:旨在拉近实体对嵌入与其对应关系嵌入之间的距离,同时推远其与无类关系嵌入的距离。此外,我们将正-无标签学习适配至该损失目标。为提升模型泛化能力,我们利用丢弃法(dropout)增强正样本,并提出一种正样本-无类混合方法。大量实验表明,在不完整标注的文档级RE场景中,P3M使F1分数提升约4-10个百分点,并在完全标注场景中取得最优结果。此外,在不完整标注场景下,P3M对先验估计偏差也展现出鲁棒性。