Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages, providing users with seamless access to diverse and comprehensive knowledge. Existing methods, mostly supervised, face challenges in obtaining labeled entity pairs. To address this, recent studies have shifted towards self-supervised and unsupervised frameworks. Despite their effectiveness, these approaches have limitations: (1) Relation passing: mainly focusing on the entity while neglecting the semantic information of relations, (2) Isomorphic assumption: assuming isomorphism between source and target graphs, which leads to noise and reduced alignment accuracy, and (3) Noise vulnerability: susceptible to noise in the textual features, especially when encountering inconsistent translations or Out-of-Vocabulary (OOV) problems. In this paper, we propose ERAlign, an unsupervised and robust cross-lingual EA pipeline that jointly performs Entity-level and Relation-level Alignment by neighbor triple matching strategy using semantic textual features of relations and entities. Its refinement step iteratively enhances results by fusing entity-level and relation-level alignments based on neighbor triple matching. The additional verification step examines the entities' neighbor triples as the linearized text. This Align-then-Verify pipeline rigorously assesses alignment results, achieving near-perfect alignment even in the presence of noisy textual features of entities. Our extensive experiments demonstrate that the robustness and general applicability of ERAlign improved the accuracy and effectiveness of EA tasks, contributing significantly to knowledge-oriented applications.
翻译:跨语言实体对齐(EA)能够整合不同语言的知识图谱(KG),为用户提供无缝访问多样化、全面知识的途径。现有方法大多为有监督方法,在获取已标注实体对方面面临挑战。为解决此问题,近期研究已转向自监督和无监督框架。尽管这些方法有效,但仍存在局限性:(1)关系传递:主要关注实体而忽略关系的语义信息;(2)同构假设:假设源图与目标图同构,这会导致噪声并降低对齐精度;(3)噪声敏感性:易受文本特征中的噪声影响,尤其在遇到不一致翻译或词汇表外(OOV)问题时。本文提出ERAlign,一种无监督且鲁棒的跨语言EA流程,通过利用关系和实体的语义文本特征,采用邻域三元组匹配策略联合执行实体级和关系级对齐。其细化步骤基于邻域三元组匹配融合实体级和关系级对齐结果以迭代增强性能。额外的验证步骤将实体的邻域三元组作为线性化文本进行检验。这种“先对齐后验证”的流程严格评估对齐结果,即使在实体文本特征存在噪声的情况下也能实现近乎完美的对齐。我们的大量实验表明,ERAlign的鲁棒性和普适性提升了EA任务的准确性和有效性,对知识导向型应用具有重要贡献。