Knowledge graph completion (KGC) aims to predict missing facts in knowledge graphs (KGs), which is crucial as modern KGs remain largely incomplete. While training KGC models on multiple aligned KGs can improve performance, previous methods that rely on transferring raw data among KGs raise privacy concerns. To address this challenge, we propose a new federated learning framework that implicitly aggregates knowledge from multiple KGs without demanding raw data exchange and entity alignment. We treat each KG as a client that trains a local language model through textbased knowledge representation learning. A central server then aggregates the model weights from clients. As natural language provides a universal representation, the same knowledge thus has similar semantic representations across KGs. As such, the aggregated language model can leverage complementary knowledge from multilingual KGs without demanding raw user data sharing. Extensive experiments on a benchmark dataset demonstrate that our method substantially improves KGC on multilingual KGs, achieving comparable performance to state-of-the-art alignment-based models without requiring any labeled alignments or raw user data sharing. Our codes will be publicly available.
翻译:知识图谱补全(KGC)旨在预测知识图谱(KG)中缺失的事实,这对于现代知识图谱普遍存在不完整性至关重要。尽管在多个对齐的知识图谱上训练KGC模型可以提升性能,但以往依赖在知识图谱间传输原始数据的方法引发了隐私担忧。为解决这一挑战,我们提出了一种新的联邦学习框架,该框架能在无需原始数据交换和实体对齐的前提下,隐式地从多个知识图谱中聚合知识。我们将每个知识图谱视为一个客户端,通过基于文本的知识表示学习来训练本地语言模型,随后由一个中央服务器聚合各客户端的模型权重。由于自然语言提供了一种通用表示,同一知识在不同知识图谱中具有相似的语义表征,因此聚合后的语言模型能够利用来自多语言知识图谱的互补知识,而无需共享原始用户数据。在基准数据集上的广泛实验表明,我们的方法显著提升了多语言知识图谱的补全性能,在无需任何标注对齐或原始用户数据共享的情况下,达到了与最先进的基于对齐的模型相当的性能。我们的代码将公开提供。