The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In this paper, we propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE), which supports both Chinese and English. Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly. Experimental results show that our proposed framework achieves state-of-the-art performance on Chinese datasets while also achieving comparable performance on English datasets under both supervised settings and zero-shot settings.
翻译:信息抽取任务的难点在于处理任务特定的标签模式与异构数据结构。近期研究提出基于大语言模型的方法来统一建模不同信息抽取任务。然而,现有方法对英语以外的中文语种信息抽取能力存在不足。本文提出了面向通用信息抽取的端到端对话增强指令微调框架(YAYI-UIE),支持中英双语。具体而言,我们利用对话数据与信息抽取数据联合增强信息抽取性能。实验结果表明,所提框架在中文数据集上取得最优性能,同时在有监督与零样本设置下的英文数据集上也获得可比结果。