Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text, unrestricted by relation type or domain. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective absent in prior surveys. It examines the evolution of task settings in OpenIE to align with the advances in recent technologies. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework. Additionally, it highlights prevalent datasets and evaluation metrics currently in use. Building on this extensive review, the paper outlines potential future directions in terms of datasets, information sources, output formats, methodologies, and evaluation metrics.
翻译:开放信息抽取(OpenIE)是一项重要的自然语言处理任务,旨在从非结构化文本中提取结构化信息,不受关系类型或领域限制。本综述论文概述了2007年至2024年间开放信息抽取技术的发展历程,重点突出了以往综述中缺乏的时间脉络视角。论文审视了开放信息抽取任务设置如何随最新技术进步而演变,将开放信息抽取方法划分为基于规则的方法、神经网络方法和预训练大语言模型方法三大类,并分别从时间顺序角度进行阐述。此外,论文重点介绍了当前普遍使用的数据集和评估指标。基于这一广泛综述,论文从数据集、信息源、输出格式、方法论和评估指标五个方面展望了未来潜在的研究方向。