Event extraction (EE) plays an important role in many industrial application scenarios, and high-quality EE methods require a large amount of manual annotation data to train supervised learning models. However, the cost of obtaining annotation data is very high, especially for annotation of domain events, which requires the participation of experts from corresponding domain. So we introduce active learning (AL) technology to reduce the cost of event annotation. But the existing AL methods have two main problems, which make them not well used for event extraction. Firstly, the existing pool-based selection strategies have limitations in terms of computational cost and sample validity. Secondly, the existing evaluation of sample importance lacks the use of local sample information. In this paper, we present a novel deep AL method for EE. We propose a batch-based selection strategy and a Memory-Based Loss Prediction model (MBLP) to select unlabeled samples efficiently. During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information. Finally, we propose a delayed training strategy to train the MBLP model. Extensive experiments are performed on three domain datasets, and our method outperforms other state-of-the-art methods.
翻译:事件抽取(EE)在众多工业应用场景中扮演着重要角色,高质量的事件抽取方法需要大量人工标注数据来训练监督学习模型。然而,获取标注数据的成本极高,尤其是领域事件标注需要相应领域专家的参与。为此,我们引入主动学习(AL)技术以降低事件标注成本。但现有主动学习方法存在两个主要问题导致其无法很好地应用于事件抽取:首先,基于池的选择策略在计算成本和样本有效性方面存在局限性;其次,现有的样本重要性评估缺乏对局部样本信息的利用。本文提出了一种用于事件抽取的新型深度主动学习方法。我们设计了批量选择策略和基于记忆的损失预测模型(MBLP),能够高效选择未标注样本。在选择过程中,我们采用内外样本损失排序方法,利用局部信息评估样本重要性。最后,我们提出延迟训练策略来训练MBLP模型。在三个领域数据集上的大量实验表明,我们的方法优于其他现有先进方法。