Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph Datasets

The flourishing of knowledge graph applications has driven the need for entity alignment (EA) across KGs. However, the heterogeneity of practical KGs, characterized by differing scales, structures, and limited overlapping entities, greatly surpasses that of existing EA datasets. This discrepancy highlights an oversimplified heterogeneity in current EA datasets, which obstructs a full understanding of the advancements achieved by recent EA methods. In this paper, we study the performance of EA methods in practical settings, specifically focusing on the alignment of highly heterogeneous KGs (HHKGs). Firstly, we address the oversimplified heterogeneity settings of current datasets and propose two new HHKG datasets that closely mimic practical EA scenarios. Then, based on these datasets, we conduct extensive experiments to evaluate previous representative EA methods. Our findings reveal that, in aligning HHKGs, valuable structure information can hardly be exploited through message-passing and aggregation mechanisms. This phenomenon leads to inferior performance of existing EA methods, especially those based on GNNs. These findings shed light on the potential problems associated with the conventional application of GNN-based methods as a panacea for all EA datasets. Consequently, in light of these observations and to elucidate what EA methodology is genuinely beneficial in practical scenarios, we undertake an in-depth analysis by implementing a simple but effective approach: Simple-HHEA. This method adaptly integrates entity name, structure, and temporal information to navigate the challenges posed by HHKGs. Our experiment results conclude that the key to the future EA model design in practice lies in their adaptability and efficiency to varying information quality conditions, as well as their capability to capture patterns across HHKGs.

翻译：知识图谱应用的蓬勃发展推动了跨知识图谱实体对齐（EA）的需求。然而，实际知识图谱的异构性——表现为规模差异、结构差异以及重叠实体有限——远超现有EA数据集的程度。这一差异揭示了当前EA数据集对异构性的过度简化，阻碍了对现有EA方法进展的全面理解。本文研究了EA方法在实用场景中的性能，特别聚焦于高异构知识图谱（HHKG）的对齐问题。首先，我们针对当前数据集过度简化的异构性设定，提出了两个紧密模拟实际EA场景的新型HHKG数据集。随后，基于这些数据集，我们开展了大量实验以评估先前具有代表性的EA方法。研究发现，在对齐HHKGs时，有价值的结构信息难以通过消息传递与聚合机制加以利用；这一现象导致现有EA方法（尤其是基于GNN的方法）表现不佳。这些发现揭示了将基于GNN的方法作为所有EA数据集万能解决方案的传统应用可能存在的问题。基于上述观察，为阐明在实践中真正有益的EA方法论，我们通过实现一种简单而有效的方法——Simple-HHEA开展了深入分析。该方法自适应地整合实体名称、结构及时间信息，以应对HHKGs带来的挑战。实验结果表明，未来EA模型在实践中的设计关键，在于其对不同信息质量条件的适应性与效率，以及其捕捉HHKGs间模式的能力。