Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.
翻译:具有指令跟随能力的大语言模型为更广泛的用户群体打开了大门。然而,在信息抽取这一自然语言处理经典任务中,大多数任务特定系统难以与非专家用户的个性化长尾抽取需求良好对齐。为解决这一问题,我们提出了一种新范式,称为"按需信息抽取",旨在满足真实世界用户的个性化需求。该任务的目标是根据指令从相关文本中抽取所需内容,并以结构化表格形式呈现。表格标题既可由用户指定,也可由模型根据上下文推断得出。为促进这一新兴领域的研究,我们构建了一个名为InstructIE的基准数据集,包含自动生成的训练数据以及人工标注的测试集。基于InstructIE,我们进一步开发了按需信息抽取器ODIE。在基准数据集上的全面评估表明,ODIE在性能上显著优于同等规模的开源模型。我们的代码和数据集已发布于 https://github.com/yzjiao/On-Demand-IE。