We investigate the entity alignment problem with unlabeled dangling cases, meaning that there are entities in the source or target graph having no counterparts in the other, and those entities remain unlabeled. The problem arises when the source and target graphs are of different scales, and it is much cheaper to label the matchable pairs than the dangling entities. To solve the issue, we propose a novel GNN-based dangling detection and entity alignment framework. While the two tasks share the same GNN and are trained together, the detected dangling entities are removed in the alignment. Our framework is featured by a designed entity and relation attention mechanism for selective neighborhood aggregation in representation learning, as well as a positive-unlabeled learning loss for an unbiased estimation of dangling entities. Experimental results have shown that each component of our design contributes to the overall alignment performance which is comparable or superior to baselines, even if the baselines additionally have 30\% of the dangling entities labeled as training data.
翻译:我们研究了带有未标注悬挂实体的实体对齐问题,即源图或目标图中存在部分实体在另一图中没有对应实体,且这些实体未被标注。当源图与目标图规模不同,且标注可匹配实体对的成本远低于标注悬挂实体时,便会出现此类问题。为解决该问题,我们提出了一种新颖的基于图神经网络(GNN)的悬挂实体检测与实体对齐框架。这两项任务共享同一GNN并联合训练,在对齐过程中检测到的悬挂实体将被移除。该框架的特色在于设计了一种实体与关系注意力机制,用于表示学习中的选择性邻域聚合,以及一种正无标记学习损失函数,用于对悬挂实体进行无偏估计。实验结果表明,我们设计的每个组件均有助于提升整体对齐性能,该性能与基线方法相当或更优,即使基线方法额外使用了30%标注为训练数据的悬挂实体。