Entity alignment (EA) aims to merge two knowledge graphs (KGs) by identifying equivalent entity pairs. While existing methods heavily rely on human-generated labels, it is prohibitively expensive to incorporate cross-domain experts for annotation in real-world scenarios. The advent of Large Language Models (LLMs) presents new avenues for automating EA with annotations, inspired by their comprehensive capability to process semantic information. However, it is nontrivial to directly apply LLMs for EA since the annotation space in real-world KGs is large. LLMs could also generate noisy labels that may mislead the alignment. To this end, we propose a unified framework, LLM4EA, to effectively leverage LLMs for EA. Specifically, we design a novel active learning policy to significantly reduce the annotation space by prioritizing the most valuable entities based on the entire inter-KG and intra-KG structure. Moreover, we introduce an unsupervised label refiner to continuously enhance label accuracy through in-depth probabilistic reasoning. We iteratively optimize the policy based on the feedback from a base EA model. Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency.
翻译:实体对齐(EA)旨在通过识别等价实体对来融合两个知识图谱(KG)。现有方法严重依赖人工标注的标签,而在实际场景中聘请跨领域专家进行标注成本极高。大语言模型(LLM)的出现为自动化实体对齐标注提供了新途径,这得益于其处理语义信息的综合能力。然而,由于现实世界知识图谱的标注空间巨大,直接应用大语言模型进行实体对齐并非易事。此外,大语言模型可能生成带有噪声的标签,从而误导对齐过程。为此,我们提出了一个统一框架LLM4EA,以有效利用大语言模型进行实体对齐。具体而言,我们设计了一种新颖的主动学习策略,通过基于知识图谱间与知识图谱内的整体结构优先选择最有价值的实体,从而显著减少标注空间。此外,我们引入了一种无监督标签优化器,通过深入的概率推理持续提升标签准确性。我们基于基础实体对齐模型的反馈迭代优化该策略。大量实验在四个基准数据集上验证了LLM4EA在有效性、鲁棒性和效率方面的优势。