Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at https://github.com/HestiaSky/IMF-Pytorch.
翻译:链接预测旨在识别知识图谱中可能缺失的三元组。为提升预测效果,近期研究将多模态信息引入链接预测。然而,现有方法均单独利用各模态信息,未能有效建模不同模态间的复杂交互。本文旨在更好地建模模态间信息交互,提出一种新型交互式多模态融合(IMF)模型以整合不同模态的知识。为此,我们设计了两阶段多模态融合框架,既保留模态特定知识,又充分利用不同模态间的互补性。与直接将不同模态投影至统一空间不同,我们的多模态融合模块保持各模态表示的独立性,同时采用双线性池化进行融合,并引入对比学习作为额外约束。此外,决策融合模块通过对所有模态的预测结果进行加权平均,以更有效地融合不同模态的互补性。在多个真实数据集上的实验验证了该方法的有效性。实现代码已开源至https://github.com/HestiaSky/IMF-Pytorch。