To resolve the semantic ambiguity in texts, we propose a model, which innovatively combines a knowledge graph with an improved attention mechanism. An existing knowledge base is utilized to enrich the text with relevant contextual concepts. The model operates at both character and word levels to deepen its understanding by integrating the concepts. We first adopt information gain to select import words. Then an encoder-decoder framework is used to encode the text along with the related concepts. The local attention mechanism adjusts the weight of each concept, reducing the influence of irrelevant or noisy concepts during classification. We improve the calculation formula for attention scores in the local self-attention mechanism, ensuring that words with different frequencies of occurrence in the text receive higher attention scores. Finally, the model employs a Bi-directional Gated Recurrent Unit (Bi-GRU), which is effective in feature extraction from texts for improved classification accuracy. Its performance is demonstrated on datasets such as AGNews, Ohsumed, and TagMyNews, achieving accuracy of 75.1%, 58.7%, and 68.5% respectively, showing its effectiveness in classifying tasks.
翻译:为解决文本中的语义歧义问题,本文提出了一种创新性地融合知识图谱与改进注意力机制的模型。该模型利用现有知识库为文本补充相关上下文概念,并在字符级和词级两个层面操作,通过整合概念加深对文本的理解。首先采用信息增益筛选重要词汇,随后使用编码器-解码器框架对文本及关联概念进行编码。局部注意力机制调整各概念的权重,降低分类过程中无关或噪声概念的影响。本文改进了局部自注意力机制中注意力分数的计算公式,确保文本中出现频率不同的词汇均能获得更高的注意力分数。最终,模型采用双向门控循环单元(Bi-GRU),该单元能有效提取文本特征以提升分类准确率。在AGNews、Ohsumed和TagMyNews数据集上的实验表明,该模型分别取得了75.1%、58.7%和68.5%的准确率,验证了其在分类任务中的有效性。