Entity alignment is crucial for merging knowledge across knowledge graphs, as it matches entities with identical semantics. The standard method matches these entities based on their embedding similarities using semi-supervised learning. However, diverse data sources lead to non-isomorphic neighborhood structures for aligned entities, complicating alignment, especially for less common and sparsely connected entities. This paper presents a soft label propagation framework that integrates multi-source data and iterative seed enhancement, addressing scalability challenges in handling extensive datasets where scale computing excels. The framework uses seeds for anchoring and selects optimal relationship pairs to create soft labels rich in neighborhood features and semantic relationship data. A bidirectional weighted joint loss function is implemented, which reduces the distance between positive samples and differentially processes negative samples, taking into account the non-isomorphic neighborhood structures. Our method outperforms existing semi-supervised approaches, as evidenced by superior results on multiple datasets, significantly improving the quality of entity alignment.
翻译:实体对齐对于融合跨知识图谱的知识至关重要,因为它匹配具有相同语义的实体。标准方法基于半监督学习,利用嵌入相似性来匹配这些实体。然而,不同的数据源导致对齐实体具有非同构的邻域结构,这使对齐过程复杂化,特别是对于较少见且连接稀疏的实体。本文提出了一种软标签传播框架,该框架整合了多源数据和迭代种子增强,以应对处理大规模数据集时的可扩展性挑战,其中规模计算表现出色。该框架使用种子进行锚定,并选择最优关系对来创建富含邻域特征和语义关系数据的软标签。我们实现了一个双向加权联合损失函数,该函数减小正样本之间的距离并对负样本进行差异化处理,同时考虑了非同构的邻域结构。我们的方法在多个数据集上取得了优异的结果,证明了其优于现有的半监督方法,显著提升了实体对齐的质量。