Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity. Our code is available at \url{https://github.com/ymgw55/WSMD}.
翻译:衡量两个句子之间的语义相似度仍是一项重要任务。词移距离(WMD)通过词嵌入集合之间的最优对齐来计算相似度。然而,WMD未利用词序信息,导致即使语义差异显著,也难以区分存在大量相似词重叠的句子。本文尝试通过整合BERT自注意力矩阵(SAM)所表示的句子结构来改进WMD。所提方法基于融合的格罗莫夫-瓦瑟斯坦距离,该距离在计算两个句子间的最优传输时,同时考虑了词嵌入与SAM的相似度。实验表明,所提方法在复述识别任务中提升了WMD及其变体的性能,同时在语义文本相似度任务中保持近乎等效的表现。我们的代码发布于\url{https://github.com/ymgw55/WSMD}。