Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we present AspectCSE, an approach for aspect-based contrastive learning of sentence embeddings. Results indicate that AspectCSE achieves an average improvement of 3.97% on information retrieval tasks across multiple aspects compared to the previous best results. We also propose using Wikidata knowledge graph properties to train models of multi-aspect sentence embeddings in which multiple specific aspects are simultaneously considered during similarity predictions. We demonstrate that multi-aspect embeddings outperform single-aspect embeddings on aspect-specific information retrieval tasks. Finally, we examine the aspect-based sentence embedding space and demonstrate that embeddings of semantically similar aspect labels are often close, even without explicit similarity training between different aspect labels.
翻译:通用句子嵌入提供了语义文本相似性的粗粒度近似,但忽略了使文本相似的特定方面。相反,基于方面的句子嵌入能根据预定义的特定方面提供文本间的相似性。因此,文本的相似性预测更具针对性且更易解释。本文提出AspectCSE,一种用于句子嵌入的方面感知对比学习方法。结果表明,与先前最优结果相比,AspectCSE在多个方面的信息检索任务上实现了平均3.97%的性能提升。我们进一步提出利用Wikidata知识图谱属性训练多方面的句子嵌入模型,使得在相似性预测中能同时考虑多个特定方面。实验证明,在特定方面的信息检索任务中,多方面的嵌入优于单方面的嵌入。最后,我们探究了基于方面的句子嵌入空间,并表明即使未对不同方面标签进行显式相似性训练,语义相近的方面标签的嵌入往往也彼此接近。