Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.
翻译:嵌套命名实体识别(NNER)旨在解决重叠实体的识别问题。与扁平命名实体识别(FNER)相比,NNER在语料库中的标注资源较为稀缺。数据增强是解决标注语料不足的有效途径。然而,针对NNER的数据增强方法目前仍缺乏深入探索。由于NNER中存在嵌套实体,现有的数据增强方法无法直接应用于NNER任务。因此,本研究聚焦于NNER的数据增强,并采用更具表达力的结构——通过嵌套词与嵌套标签组合构成的组合嵌套标签分类(CNLC)——来建模嵌套实体。我们使用组合嵌套学习(CNL)对数据集进行增强。此外,我们提出了置信度过滤机制(CFM),以更高效地筛选生成数据。实验结果表明,该方法在ACE2004和ACE2005数据集上取得了性能提升,并缓解了样本不平衡带来的影响。