As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.
翻译:作为实体对齐(EA)的重要扩展,多模态实体对齐(MMEA)旨在通过利用关联的视觉信息,识别跨不同知识图谱(KG)中相同的实体。然而,现有的MMEA方法主要关注多模态实体特征的融合范式,而忽略了视觉图像普遍存在的缺失现象和内在模糊性所带来的挑战。本文对视觉模态不完整性进行了进一步分析,在我们提出的MMEA-UMVM数据集上对最新MMEA模型进行了基准测试,该数据集涵盖双语和单语对齐KG类型,并采用标准(非迭代)和迭代训练范式评估模型性能。研究表明,在面对模态不完整性时,模型会陷入对模态噪声的过拟合,并在高缺失率下表现出性能波动或下降。这证明,额外多模态数据的引入有时反而会对EA产生不利影响。为解决这些挑战,我们提出了UMAEA,一种稳健的多模态实体对齐方法,专为处理不确定缺失和模糊的视觉模态而设计。该方法在所有97个基准划分上一致取得了最先进(SOTA)性能,在参数和时间消耗有限的情况下显著超越现有基线,同时有效缓解了其他模型存在的上述局限性。我们的代码和基准数据可从https://github.com/zjukg/UMAEA获取。