Multimodal Knowledge Graph Construction (MMKC) refers to the process of creating a structured representation of entities and relationships through multiple modalities such as text, images, videos, etc. However, existing MMKC models have limitations in handling the introduction of new entities and relations due to the dynamic nature of the real world. Moreover, most state-of-the-art studies in MMKC only consider entity and relation extraction from text data while neglecting other multi-modal sources. Meanwhile, the current continual setting for knowledge graph construction only consider entity and relation extraction from text data while neglecting other multi-modal sources. Therefore, there arises the need to explore the challenge of continuous multimodal knowledge graph construction to address the phenomenon of catastrophic forgetting and ensure the retention of past knowledge extracted from different forms of data. This research focuses on investigating this complex topic by developing lifelong multimodal benchmark datasets. Based on the empirical findings that several state-of-the-art MMKC models, when trained on multimedia data, might unexpectedly underperform compared to those solely utilizing textual resources in a continual setting, we propose a Lifelong MultiModal Consistent Transformer Framework (LMC) for continuous multimodal knowledge graph construction. By combining the advantages of consistent KGC strategies within the context of continual learning, we achieve greater balance between stability and plasticity. Our experiments demonstrate the superior performance of our method over prevailing continual learning techniques or multimodal approaches in dynamic scenarios. Code and datasets can be found at https://github.com/zjunlp/ContinueMKGC.
翻译:多模态知识图谱构建(Multimodal Knowledge Graph Construction, MMKC)指的是通过文本、图像、视频等多种模态创建实体及关系结构化表示的过程。然而,由于现实世界的动态特性,现有MMKC模型在处理新实体和关系的引入时存在局限性。此外,当前大多数MMKC前沿研究仅考虑从文本数据中提取实体与关系,而忽略了其他多模态来源。同时,现有知识图谱构建的持续学习设定也仅关注文本数据的实体关系提取,未能涵盖多模态信息。因此,亟需探索持续多模态知识图谱构建的挑战,以解决灾难性遗忘问题,并确保从不同形式数据中提取的过往知识得以保留。本研究聚焦于这一复杂课题,通过构建终身多模态基准数据集展开探索。基于实证发现——在持续学习场景下,若干前沿MMKC模型在多媒体数据上训练时,其性能甚至可能意外低于仅使用文本资源的模型——我们提出了一种用于持续多模态知识图谱构建的终身多模态一致Transformer框架(LMC)。通过将持续学习语境下的一致性知识图谱构建策略优势相结合,我们在稳定性与可塑性之间实现了更优平衡。实验结果表明,在动态场景中,我们的方法相较于现有主流持续学习技术或多模态方法具有更优越的性能。代码与数据集请访问:https://github.com/zjunlp/ContinueMKGC。