Integrating lexicon into character-level sequence has been proven effective to leverage word boundary and semantic information in Chinese named entity recognition (NER). However, prior approaches usually utilize feature weighting and position coupling to integrate word information, but ignore the semantic and contextual correspondence between the fine-grained semantic units in the character-word space. To solve this issue, we propose a Unified Lattice Graph Fusion (ULGF) approach for Chinese NER. ULGF can explicitly capture various semantic and boundary relations across different semantic units with the adjacency matrix by converting the lattice structure into a unified graph. We stack multiple graph-based intra-source self-attention and inter-source cross-gating fusion layers that iteratively carry out semantic interactions to learn node representations. To alleviate the over-reliance on word information, we further propose to leverage lexicon entity classification as an auxiliary task. Experiments on four Chinese NER benchmark datasets demonstrate the superiority of our ULGF approach.
翻译:将词典融入字符级序列已被证明能有效利用中文命名实体识别(NER)中的词边界和语义信息。然而,现有方法通常采用特征加权和位置耦合来整合词信息,却忽视了字符-词空间中细粒度语义单元之间的语义和上下文对应关系。为解决这一问题,我们提出了一种面向中文NER的统一网格图融合(ULGF)方法。ULGF通过将网格结构转化为统一图,利用邻接矩阵显式捕获不同语义单元间的多种语义与边界关系。我们堆叠了多个基于图的源内自注意力和源间交叉门控融合层,通过迭代执行语义交互来学习节点表示。为缓解对词信息的过度依赖,我们进一步提出将词典实体分类作为辅助任务。在四个中文NER基准数据集上的实验表明,我们的ULGF方法具有优越性。