The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.
翻译:知识图谱间的实体对齐任务旨在识别来自两个不同知识图谱中代表同一实体的每一对实体。针对该任务,已有许多基于机器学习的方法被提出。然而,据我们所知,现有方法均需要人工标注的种子对齐,这成本高昂。本文提出了首个完全自动的对齐方法AutoAlign,该方法无需任何人工标注的种子对齐。具体而言,对于谓词嵌入,AutoAlign借助大语言模型构建谓词邻近图,自动捕捉两个知识图谱间谓词的相似性;对于实体嵌入,AutoAlign首先利用TransE独立计算每个知识图谱的实体嵌入,然后基于实体属性计算相似性,将两个知识图谱的实体嵌入映射至同一向量空间。因此,谓词对齐与实体对齐均无需人工标注的种子对齐。AutoAlign不仅完全自动化,且具有高效性。基于真实世界知识图谱的实验表明,与最先进方法相比,AutoAlign显著提升了实体对齐的性能。