Entity alignment (EA) aims to merge two knowledge graphs (KGs) by identifying equivalent entity pairs. While existing methods heavily rely on human-generated labels, it is prohibitively expensive to incorporate cross-domain experts for annotation in real-world scenarios. The advent of Large Language Models (LLMs) presents new avenues for automating EA with annotations, inspired by their comprehensive capability to process semantic information. However, it is nontrivial to directly apply LLMs for EA since the annotation space in real-world KGs is large. LLMs could also generate noisy labels that may mislead the alignment. To this end, we propose a unified framework, LLM4EA, to effectively leverage LLMs for EA. Specifically, we design a novel active learning policy to significantly reduce the annotation space by prioritizing the most valuable entities based on the entire inter-KG and intra-KG structure. Moreover, we introduce an unsupervised label refiner to continuously enhance label accuracy through in-depth probabilistic reasoning. We iteratively optimize the policy based on the feedback from a base EA model. Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency. Codes are available via https://github.com/chensyCN/llm4ea_official.
翻译:实体对齐(EA)旨在通过识别等价实体对来融合两个知识图谱(KG)。现有方法严重依赖人工生成的标签,而在实际场景中聘请跨领域专家进行标注成本极高。大型语言模型(LLM)的出现为自动化实体对齐标注提供了新途径,这得益于其处理语义信息的综合能力。然而,由于现实世界知识图谱的标注空间庞大,直接应用LLM进行实体对齐并非易事。LLM还可能生成噪声标签,从而误导对齐过程。为此,我们提出了一个统一框架LLM4EA,以有效利用LLM进行实体对齐。具体而言,我们设计了一种新颖的主动学习策略,通过基于整个知识图谱间和知识图谱内的结构,优先选择最有价值的实体,从而显著减少标注空间。此外,我们引入了一种无监督标签优化器,通过深入的概率推理持续提升标签准确性。我们基于基础实体对齐模型的反馈迭代优化该策略。大量实验证明了LLM4EA在四个基准数据集上在有效性、鲁棒性和效率方面的优势。代码可通过 https://github.com/chensyCN/llm4ea_official 获取。