Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs for integration. Unfortunately, prior arts have attempted to improve the interaction and fusion of multi-modal information, which have overlooked the influence of modal-specific noise and the usage of labeled and unlabeled data in semi-supervised settings. In this work, we introduce a Pseudo-label Calibration Multi-modal Entity Alignment (PCMEA) in a semi-supervised way. Specifically, in order to generate holistic entity representations, we first devise various embedding modules and attention mechanisms to extract visual, structural, relational, and attribute features. Different from the prior direct fusion methods, we next propose to exploit mutual information maximization to filter the modal-specific noise and to augment modal-invariant commonality. Then, we combine pseudo-label calibration with momentum-based contrastive learning to make full use of the labeled and unlabeled data, which improves the quality of pseudo-label and pulls aligned entities closer. Finally, extensive experiments on two MMEA datasets demonstrate the effectiveness of our PCMEA, which yields state-of-the-art performance.
翻译:多模态实体对齐(MMEA)旨在识别两个多模态知识图谱中的等价实体以实现集成。遗憾的是,现有方法试图改进多模态信息的交互与融合,却忽视了模态特定噪声的影响以及半监督设置中有标签和无标签数据的利用。本文提出了一种伪标签校准半监督多模态实体对齐方法(PCMEA)。具体而言,为生成整体实体表示,我们首先设计多种嵌入模块和注意力机制来提取视觉、结构、关系和属性特征。与现有直接融合方法不同,我们进一步提出利用互信息最大化来过滤模态特定噪声并增强模态不变共性。随后,我们将伪标签校准与动量对比学习相结合,充分挖掘有标签和无标签数据的价值,从而提升伪标签质量并使对齐实体更加接近。最后,在两个MMEA数据集上的大量实验证明了PCMEA的有效性,其达到了当前最优性能。