Information extraction (IE) has been studied extensively. The existing methods always follow a fixed extraction order for complex IE tasks with multiple elements to be extracted in one instance such as event extraction. However, we conduct experiments on several complex IE datasets and observe that different extraction orders can significantly affect the extraction results for a great portion of instances, and the ratio of sentences that are sensitive to extraction orders increases dramatically with the complexity of the IE task. Therefore, this paper proposes a novel adaptive ordered IE paradigm to find the optimal element extraction order for different instances, so as to achieve the best extraction results. We also propose an reinforcement learning (RL) based framework to generate optimal extraction order for each instance dynamically. Additionally, we propose a co-training framework adapted to RL to mitigate the exposure bias during the extractor training phase. Extensive experiments conducted on several public datasets demonstrate that our proposed method can beat previous methods and effectively improve the performance of various IE tasks, especially for complex ones.
翻译:信息抽取(IE)已得到广泛研究。现有方法在处理包含多个待抽取元素(如事件抽取)的复杂IE任务时,始终遵循固定的抽取顺序。然而,我们在多个复杂IE数据集上的实验发现:对于大量实例而言,不同的抽取顺序会显著影响抽取结果,且对抽取顺序敏感的句子比例随IE任务复杂度的增加而急剧上升。为此,本文提出一种新颖的自适应排序信息抽取范式,针对不同实例寻找最优元素抽取顺序,从而实现最佳抽取效果。我们同时提出基于强化学习(RL)的框架,能够为每个实例动态生成最优抽取顺序。此外,为缓解抽取器训练阶段的曝光偏差,我们设计了适应RL的协同训练框架。在多个公开数据集上的大量实验表明,所提方法能够超越现有方法,有效提升各类IE任务(尤其是复杂任务)的性能。