Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores ($\ge 90\%$) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zero-cardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15\% in one dataset and 6-14\% in another dataset) in the models' F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.
翻译:从文本中抽取关系三元组是构建知识库的关键任务。联合实体与关系抽取模型的最新进展在从自由文本中准确抽取关系三元组方面展现出卓越的F1得分($\ge 90\%$)。然而,这些模型是在限制性实验设置和不现实的数据集下进行评估的。它们忽略了不含三元组的句子(零基数情况),从而简化了任务。本文在更现实的设置下对最先进的联合实体与关系抽取模型进行了基准研究。我们将不包含任何三元组的句子纳入实验,提供了全面的评估。研究发现,在此现实实验设置下,模型的F1得分显著下降(一个数据集中约下降10-15%,另一数据集中下降6-14%)。此外,我们提出了一种利用简单BERT分类器的两步建模方法,该方法在现实实验设置下显著提升了这些模型的整体性能。