As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.
翻译:作为实体对齐(EA)的重要扩展,多模态实体对齐(MMEA)旨在通过利用关联的视觉信息,识别跨不同知识图谱(KGs)的相同实体。然而,现有的MMEA方法主要关注多模态实体特征的融合范式,而忽视了视觉图像普遍存在的缺失现象及内在模糊性所带来的挑战。本文对视觉模态的不完整性进行了进一步分析,在我们提出的MMEA-UMVM数据集上对最新的MMEA模型进行了基准测试,该数据集涵盖双语和单语对齐KGs类型,并采用标准(非迭代)与迭代训练范式评估模型性能。研究表明,面对模态不完整性,模型会过度拟合模态噪声,并在高缺失率下出现性能波动或下降。这证明引入额外的多模态数据有时可能对EA产生负面影响。为解决这些挑战,我们提出UMAEA,一种稳健的多模态实体对齐方法,专门针对不确定缺失和模糊视觉模态。该方法在所有97个基准分割中持续达到SOTA性能,在参数有限且时间消耗较低的情况下显著超越现有基线,同时有效缓解了其他模型已识别的局限性。我们的代码和基准数据可访问 https://github.com/zjukg/UMAEA。