In recent years, short Text Matching tasks have been widely applied in the fields ofadvertising search and recommendation. The difficulty lies in the lack of semantic information and word ambiguity caused by the short length of the text. Previous works have introduced complement sentences or knowledge bases to provide additional feature information. However, these methods have not fully interacted between the original sentence and the complement sentence, and have not considered the noise issue that may arise from the introduction of external knowledge bases. Therefore, this paper proposes a short Text Matching model that combines contrastive learning and external knowledge. The model uses a generative model to generate corresponding complement sentences and uses the contrastive learning method to guide the model to obtain more semantically meaningful encoding of the original sentence. In addition, to avoid noise, we use keywords as the main semantics of the original sentence to retrieve corresponding knowledge words in the knowledge base, and construct a knowledge graph. The graph encoding model is used to integrate the knowledge base information into the model. Our designed model achieves state-of-the-art performance on two publicly available Chinese Text Matching datasets, demonstrating the effectiveness of our model.
翻译:近年来,短文本匹配任务在广告搜索和推荐领域得到了广泛应用。其难点在于文本长度较短导致的语义信息缺失和词语歧义问题。以往的研究引入了补充句或知识库以提供额外的特征信息。然而,这些方法未能充分实现原始句与补充句之间的交互,也未考虑引入外部知识库可能带来的噪声问题。因此,本文提出了一种结合对比学习与外部知识的短文本匹配模型。该模型利用生成模型生成相应的补充句,并通过对比学习方法引导模型获得更具语义意义的原始句编码。此外,为避免噪声,我们以关键词作为原始句的主要语义,在知识库中检索对应的知识词,并构建知识图谱。通过图编码模型将知识库信息整合到模型中。我们设计的模型在两个公开的中文文本匹配数据集上达到了最先进的性能,验证了其有效性。