Semi-supervised learning has been an important approach to address challenges in extracting entities and relations from limited data. However, current semi-supervised works handle the two tasks (i.e., Named Entity Recognition and Relation Extraction) separately and ignore the cross-correlation of entity and relation instances as well as the existence of similar instances across unlabeled data. To alleviate the issues, we propose Jointprop, a Heterogeneous Graph-based Propagation framework for joint semi-supervised entity and relation extraction, which captures the global structure information between individual tasks and exploits interactions within unlabeled data. Specifically, we construct a unified span-based heterogeneous graph from entity and relation candidates and propagate class labels based on confidence scores. We then employ a propagation learning scheme to leverage the affinities between labelled and unlabeled samples. Experiments on benchmark datasets show that our framework outperforms the state-of-the-art semi-supervised approaches on NER and RE tasks. We show that the joint semi-supervised learning of the two tasks benefits from their codependency and validates the importance of utilizing the shared information between unlabeled data.
翻译:半监督学习一直是解决从有限数据中提取实体与关系挑战的重要方法。然而,当前的半监督工作将两个任务(即命名实体识别和关系抽取)分开处理,忽略了实体与关系实例之间的交叉关联,以及未标注数据中存在相似实例的现象。为解决这些问题,我们提出了Jointprop——一种基于异构图传播的联合半监督实体与关系抽取框架,该框架能够捕获单个任务间的全局结构信息,并挖掘未标注数据内部的交互关系。具体而言,我们从实体与关系候选中构建统一的基于跨度(span)的异构图,并依据置信度分数传播类别标签。随后采用传播学习机制,利用标注与未标注样本之间的亲和性。在基准数据集上的实验表明,我们的框架在NER和RE任务上均优于当前最先进的半监督方法。通过联合半监督学习两个任务,我们验证了其相互依赖性的收益,并证明了利用未标注数据间共享信息的重要性。