We introduce a new Information Extraction (IE) task dubbed Instruction-based IE, which aims to ask the system to follow specific instructions or guidelines to extract information. To facilitate research in this area, we construct a dataset called InstructIE, consisting of 270,000 weakly supervised data from Chinese Wikipedia and 1,000 high-quality crowdsourced annotated instances. We further evaluate the performance of various baseline models on the InstructIE dataset. The results reveal that although current models exhibit promising performance, there is still room for improvement. Furthermore, we conduct a comprehensive case study analysis, underlining the challenges inherent in the Instruction-based IE task. Code and dataset are available at https://github.com/zjunlp/DeepKE/tree/main/example/llm.
翻译:我们提出一项名为"基于指令的信息抽取"(Instruction-based IE)的新信息抽取任务,旨在引导系统遵循特定指令或准则完成信息抽取。为促进该领域研究,我们构建了名为InstructIE的数据集,包含从中文维基百科获取的27万条弱监督数据及1000条高质量众包标注实例。在此基础上,我们评估了多种基线模型在InstructIE数据集上的表现。结果表明,尽管现有模型展现出可观性能,但仍存在改进空间。此外,我们通过全面的案例研究分析,揭示了基于指令的信息抽取任务的内在挑战。相关代码与数据集已开源至https://github.com/zjunlp/DeepKE/tree/main/example/llm。