Multi-Modal Entity Alignment (MMEA) is a critical task that aims to identify equivalent entity pairs across multi-modal knowledge graphs (MMKGs). However, this task faces challenges due to the presence of different types of information, including neighboring entities, multi-modal attributes, and entity types. Directly incorporating the above information (e.g., concatenation or attention) can lead to an unaligned information space. To address these challenges, we propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types to enhance the alignment task. Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder to preserve the unique semantics of different information. Furthermore, we design two entity-type prefix injection methods to integrate entity-type information using type prefixes, which help to restrict the global information of entities not present in the MMKGs. Our extensive experiments on benchmark datasets demonstrate that our approach outperforms strong competitors and achieves excellent entity alignment performance.
翻译:多模态实体对齐(MMEA)是一项关键任务,旨在跨多模态知识图谱(MMKGs)识别等价实体对。然而,由于存在不同类型的信息(包括邻居实体、多模态属性和实体类型),该任务面临诸多挑战。直接整合上述信息(例如通过拼接或注意力机制)可能导致信息空间不对齐。为解决这些问题,我们提出了一种新型MMEA Transformer模型,命名为MoAlign,通过分层引入邻居特征、多模态属性和实体类型来增强对齐任务。利用Transformer更好地整合多元信息的优势,我们在Transformer编码器中设计了一个分层可修改的自注意力块,以保留不同信息的独特语义。此外,我们设计了两种实体类型前缀注入方法,通过类型前缀整合实体类型信息,这有助于约束MMKG中不存在的实体的全局信息。在基准数据集上的大量实验表明,我们的方法优于强基线模型,并实现了出色的实体对齐性能。