Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.
翻译:从文本中抽取关系的方法大多聚焦于高精确率,但往往以召回率有限为代价。然而,高召回率对于填充与给定主体存在特定关系的目标实体长列表至关重要。相关目标的线索可能分散在长文本的多个段落中,这带来了从长文本中抽取长列表的挑战。我们提出L3X方法,通过两个阶段解决该问题:(1) 采用大型语言模型(LLM)结合审慎的检索增强技术进行面向召回率的生成;(2) 通过面向精确率的审查机制验证或筛选候选结果。我们的L3X方法在性能上显著超越仅依赖LLM的生成方法。