Chinese Named Entity Recognition (NER) is an important task in information extraction, which has a significant impact on downstream applications. Due to the lack of natural separators in Chinese, previous NER methods mostly relied on external dictionaries to enrich the semantic and boundary information of Chinese words. However, such methods may introduce noise that affects the accuracy of named entity recognition. To this end, we propose a character relation enhanced Chinese NER model (CRENER). This model defines four types of tags that reflect the relationships between characters, and proposes a fine-grained modeling of the relationships between characters based on three types of relationships: adjacency relations between characters, relations between characters and tags, and relations between tags, to more accurately identify entity boundaries and improve Chinese NER accuracy. Specifically, we transform the Chinese NER task into a character-character relationship classification task, ensuring the accuracy of entity boundary recognition through joint modeling of relation tags. To enhance the model's ability to understand contextual information, WRENER further constructed an adapted transformer encoder that combines unscaled direction-aware and distance-aware masked self-attention mechanisms. Moreover, a relationship representation enhancement module was constructed to model predefined relationship tags, effectively mining the relationship representations between characters and tags. Experiments conducted on four well-known Chinese NER benchmark datasets have shown that the proposed model outperforms state-of-the-art baselines. The ablation experiment also demonstrated the effectiveness of the proposed model.
翻译:中文命名实体识别(NER)是信息抽取领域的重要任务,对下游应用具有显著影响。由于中文缺乏天然分隔符,以往的NER方法多依赖外部词典来丰富词语的语义与边界信息。然而,此类方法可能引入噪声,影响命名实体识别的准确性。为此,我们提出了一种字符关系增强的中文NER模型(CRENER)。该模型定义了四类反映字符间关系的标签,并基于字符间邻接关系、字符与标签间关系以及标签间关系三类关系,对字符间关系进行细粒度建模,以更精准地识别实体边界并提升中文NER准确率。具体而言,我们将中文NER任务转化为字符-字符关系分类任务,通过关系标签的联合建模确保实体边界识别的准确性。为增强模型对上下文信息的理解能力,CRENER进一步构建了适配的Transformer编码器,该编码器结合了未缩放的方向感知与距离感知掩码自注意力机制。此外,构建了关系表示增强模块以对预定义关系标签进行建模,有效挖掘字符与标签间的关系表示。在四个知名中文NER基准数据集上进行的实验表明,所提模型性能优于现有先进基线方法。消融实验也验证了所提模型的有效性。