Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding entities and relations into continuous vector spaces. Existing methods are mainly structure-based or description-based. Structure-based methods learn representations that preserve the inherent structure of KGs. They cannot well represent abundant long-tail entities in real-world KGs with limited structural information. Description-based methods leverage textual information and language models. Prior approaches in this direction barely outperform structure-based ones, and suffer from problems like expensive negative sampling and restrictive description demand. In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods. We formulate description-based KE learning with a contrastive learning framework to improve efficiency in training and evaluation. Experimental results show that LMKE achieves state-of-the-art performance on KE benchmarks of link prediction and triple classification, especially for long-tail entities.
翻译:知识嵌入(KE)通过将实体和关系嵌入连续向量空间来表示知识图谱(KG)。现有方法主要分为基于结构和基于描述的两类。基于结构的方法学习保留知识图谱固有结构的表示,但难以充分表示现实知识图谱中结构信息有限的大量长尾实体。基于描述的方法利用文本信息和语言模型,然而先前此类方法的表现仅略优于基于结构的方法,且存在负采样成本高昂、描述需求严苛等问题。本文提出LMKE方法,采用语言模型推导知识嵌入,旨在丰富长尾实体的表示并解决现有基于描述方法的缺陷。我们基于对比学习框架构建描述性知识嵌入学习,提升训练与评估效率。实验结果表明,LMKE在链接预测与三元组分类的知识嵌入基准测试中(尤其针对长尾实体)取得了最先进的性能。