The development of knowledge graph (KG) applications has led to a rising need for entity alignment (EA) between heterogeneous KGs that are extracted from various sources. Recently, graph neural networks (GNNs) have been widely adopted in EA tasks due to GNNs' impressive ability to capture structure information. However, we have observed that the oversimplified settings of the existing common EA datasets are distant from real-world scenarios, which obstructs a full understanding of the advancements achieved by recent methods. This phenomenon makes us ponder: Do existing GNN-based EA methods really make great progress? In this paper, to study the performance of EA methods in realistic settings, we focus on the alignment of highly heterogeneous KGs (HHKGs) (e.g., event KGs and general KGs) which are different with regard to the scale and structure, and share fewer overlapping entities. First, we sweep the unreasonable settings, and propose two new HHKG datasets that closely mimic real-world EA scenarios. Then, based on the proposed datasets, we conduct extensive experiments to evaluate previous representative EA methods, and reveal interesting findings about the progress of GNN-based EA methods. We find that the structural information becomes difficult to exploit but still valuable in aligning HHKGs. This phenomenon leads to inferior performance of existing EA methods, especially GNN-based methods. Our findings shed light on the potential problems resulting from an impulsive application of GNN-based methods as a panacea for all EA datasets. Finally, we introduce a simple but effective method: Simple-HHEA, which comprehensively utilizes entity name, structure, and temporal information. Experiment results show Simple-HHEA outperforms previous models on HHKG datasets.
翻译:知识图谱(KG)应用的发展催生了从不同来源提取的异构KG之间实体对齐(EA)的迫切需求。近年来,图神经网络(GNN)凭借其强大的结构信息捕获能力,在EA任务中得到广泛应用。然而,我们发现现有常见EA数据集过度简化的设置与真实场景相去甚远,这阻碍了对现有方法所取得进展的全面理解。这一现象引发我们的思考:现有基于GNN的EA方法是否真的取得了显著进步?本文聚焦于高度异构知识图谱(HHKG)(例如事件型KG与通用型KG)的对齐——这些图谱在规模、结构上存在差异,且重叠实体较少——旨在研究真实场景下EA方法的性能。首先,我们清除了不合理的设置,并提出了两个紧密模拟真实EA场景的新HHKG数据集。进而,基于所提出的数据集,我们开展广泛的实验评估以往具有代表性的EA方法,并揭示关于基于GNN的EA方法进展的有趣发现。我们发现,结构信息虽难以利用,但在对齐HHKG时仍具价值。这一现象导致现有EA方法(尤其是基于GNN的方法)性能欠佳。我们的发现揭示了将GNN方法作为所有EA数据集万能药而盲目应用可能带来的问题。最后,我们提出一种简洁有效的方法:Simple-HHEA,该方法综合利用实体名称、结构及时序信息。实验结果表明,Simple-HHEA在HHKG数据集上优于先前模型。