Information extraction (IE) systems aim to automatically extract structured information, such as named entities, relations between entities, and events, from unstructured texts. While most existing work addresses a particular IE task, universally modeling various IE tasks with one model has achieved great success recently. Despite their success, they employ a one-stage learning strategy, i.e., directly learning to extract the target structure given the input text, which contradicts the human learning process. In this paper, we propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage, for IE by mimicking the human learning process. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability. Extensive experiments across four IE tasks demonstrate the effectiveness of our framework. We achieve new state-of-the-art results on 13 out of 17 datasets. Our code is available at \url{https://github.com/DAMO-NLP-SG/IE-E2H}.
翻译:信息抽取(Information Extraction, IE)系统旨在从非结构化文本中自动提取结构化信息,如命名实体、实体间关系及事件。尽管大多数现有工作针对特定IE任务,但近期通用建模多种IE任务的统一模型已取得显著成功。然而,这些模型采用单阶段学习策略(即直接学习从输入文本中提取目标结构),这与人类学习过程相悖。本文通过模拟人类学习过程,提出一种统一的由易到难学习框架,包含三个阶段:简易阶段、困难阶段和主阶段。通过将学习过程分解为多个阶段,该框架有助于模型掌握通用IE任务知识并提升泛化能力。在四项IE任务上的大量实验证明了该框架的有效性。我们在17个数据集中有13个取得了新的最优结果。代码已开源:\url{https://github.com/DAMO-NLP-SG/IE-E2H}。