Information extraction (IE) systems aim to automatically extract structured information, such as named entities, relations between entities, and events, from unstructured texts. While most existing work addresses a particular IE task, universally modeling various IE tasks with one model has achieved great success recently. Despite their success, they employ a one-stage learning strategy, i.e., directly learning to extract the target structure given the input text, which contradicts the human learning process. In this paper, we propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage, for IE by mimicking the human learning process. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability. Extensive experiments across four IE tasks demonstrate the effectiveness of our framework. We achieve new state-of-the-art results on 13 out of 17 datasets. Our code is available at \url{https://github.com/DAMO-NLP-SG/IE-E2H}.
翻译:信息抽取(IE)系统旨在从非结构化文本中自动提取结构化信息,如命名实体、实体间关系及事件。尽管现有工作多针对特定IE任务,但近期统一建模多种IE任务的通用模型已取得巨大成功。然而,这些模型采用单阶段学习策略(即直接学习从输入文本中提取目标结构),这与人类学习过程相悖。本文通过模拟人类学习过程,提出一个由易到难的统一学习框架,包含三个阶段:简单阶段、困难阶段与主阶段。通过将学习过程分解为多阶段,该框架有助于模型获取通用IE任务知识并提升泛化能力。在四个IE任务上的广泛实验验证了该框架的有效性,我们在17个数据集中的13个上取得了新的最优结果。代码开源地址:\url{https://github.com/DAMO-NLP-SG/IE-E2H}。