Relational extraction is one of the basic tasks related to information extraction in the field of natural language processing, and is an important link and core task in the fields of information extraction, natural language understanding, and information retrieval. None of the existing relation extraction methods can effectively solve the problem of triple overlap. The CasAug model proposed in this paper based on the CasRel framework combined with the semantic enhancement mechanism can solve this problem to a certain extent. The CasAug model enhances the semantics of the identified possible subjects by adding a semantic enhancement mechanism, First, based on the semantic coding of possible subjects, pre-classify the possible subjects, and then combine the subject lexicon to calculate the semantic similarity to obtain the similar vocabulary of possible subjects. According to the similar vocabulary obtained, each word in different relations is calculated through the attention mechanism. For the contribution of the possible subject, finally combine the relationship pre-classification results to weight the enhanced semantics of each relationship to find the enhanced semantics of the possible subject, and send the enhanced semantics combined with the possible subject to the object and relationship extraction module. Complete the final relation triplet extraction. The experimental results show that, compared with the baseline model, the CasAug model proposed in this paper has improved the effect of relation extraction, and CasAug's ability to deal with overlapping problems and extract multiple relations is also better than the baseline model, indicating that the semantic enhancement mechanism proposed in this paper It can further reduce the judgment of redundant relations and alleviate the problem of triple overlap.
翻译:关系抽取是自然语言处理领域与信息抽取相关的基础任务之一,也是信息抽取、自然语言理解与信息检索等领域的重要环节与核心任务。现有关系抽取方法均未能有效解决三元组重叠问题。本文基于CasRel框架结合语义增强机制提出的CasAug模型可一定程度上解决该问题。CasAug模型通过添加语义增强机制增强已识别可能主体的语义,首先基于可能主体的语义编码对可能主体进行预分类,再结合主体词典计算语义相似度以获取可能主体的相似词汇;依据获取的相似词汇,通过注意力机制计算不同关系下各词汇对可能主体的贡献度,最终结合关系预分类结果对各关系增强语义进行加权,求得可能主体的增强语义,并将增强语义与可能主体结合后送入对象与关系抽取模块,完成最终的关系三元组抽取。实验结果表明,相较于基线模型,本文提出的CasAug模型在关系抽取效果上有所提升,且CasAug处理重叠问题及多关系抽取的能力也优于基线模型,说明本文提出的语义增强机制能进一步减少冗余关系判定,缓解三元组重叠问题。