In recent years, short Text Matching tasks have been widely applied in the fields ofadvertising search and recommendation. The difficulty lies in the lack of semantic information and word ambiguity caused by the short length of the text. Previous works have introduced complement sentences or knowledge bases to provide additional feature information. However, these methods have not fully interacted between the original sentence and the complement sentence, and have not considered the noise issue that may arise from the introduction of external knowledge bases. Therefore, this paper proposes a short Text Matching model that combines contrastive learning and external knowledge. The model uses a generative model to generate corresponding complement sentences and uses the contrastive learning method to guide the model to obtain more semantically meaningful encoding of the original sentence. In addition, to avoid noise, we use keywords as the main semantics of the original sentence to retrieve corresponding knowledge words in the knowledge base, and construct a knowledge graph. The graph encoding model is used to integrate the knowledge base information into the model. Our designed model achieves state-of-the-art performance on two publicly available Chinese Text Matching datasets, demonstrating the effectiveness of our model.
翻译:近年来,短文本匹配任务在广告搜索与推荐领域得到广泛应用。其难点在于文本长度较短导致的语义信息匮乏和词语歧义问题。以往研究通过引入补全句子或知识库来提供额外特征信息,但这些方法未能充分实现原句与补全句之间的交互,且未考虑外部知识库引入可能带来的噪声问题。为此,本文提出一种融合对比学习与外部知识的短文本匹配模型。该模型利用生成模型生成对应的补全句子,并通过对比学习方法引导模型获取更具语义性的原句编码。此外,为避免噪声干扰,我们以关键词作为原句核心语义,在知识库中检索对应知识词并构建知识图谱,通过图编码模型将知识库信息融入网络。我们设计的模型在两个公开中文文本匹配数据集上取得了最优性能,验证了模型的有效性。