The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In this paper, we propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE), which supports both Chinese and English. Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly. Experimental results show that our proposed framework achieves state-of-the-art performance on Chinese datasets while also achieving comparable performance on English datasets under both supervised settings and zero-shot settings.
翻译:信息抽取任务的难点在于处理任务特定的标签模式与异构数据结构。近期研究提出基于大语言模型的方法来统一建模不同信息抽取任务。然而,现有方法在除英语外的中文信息抽取能力上存在不足。本文提出一种端到端的对话增强指令微调框架YAYI-UIE,支持中文和英文双语信息抽取。具体而言,我们联合使用对话数据与信息抽取数据来增强信息抽取性能。实验结果表明,所提框架在中文数据集上取得最优性能,同时在监督和零样本设置下的英文数据集上也达到可比性能。