Previous successful approaches to missing modality completion rely on carefully designed fusion techniques and extensive pre-training on complete data, which can limit their generalizability in out-of-domain (OOD) scenarios. In this study, we pose a new challenge: can we develop a missing modality completion model that is both resource-efficient and robust to OOD generalization? To address this, we present a training-free framework for missing modality completion that leverages large multimodal models (LMMs). Our approach, termed the "Knowledge Bridger", is modality-agnostic and integrates generation and ranking of missing modalities. By defining domain-specific priors, our method automatically extracts structured information from available modalities to construct knowledge graphs. These extracted graphs connect the missing modality generation and ranking modules through the LMM, resulting in high-quality imputations of missing modalities. Experimental results across both general and medical domains show that our approach consistently outperforms competing methods, including in OOD generalization. Additionally, our knowledge-driven generation and ranking techniques demonstrate superiority over variants that directly employ LMMs for generation and ranking, offering insights that may be valuable for applications in other domains.
翻译:先前成功的多模态缺失补全方法依赖于精心设计的融合技术和在完整数据上的大量预训练,这限制了其在域外(OOD)场景中的泛化能力。在本研究中,我们提出一个新的挑战:能否开发一种既资源高效又对OOD泛化具有鲁棒性的多模态缺失补全模型?为此,我们提出一种利用大型多模态模型(LMMs)的免训练多模态缺失补全框架。我们的方法称为“知识桥梁”,是模态无关的,并集成了缺失模态的生成与排序。通过定义领域特定的先验,我们的方法自动从可用模态中提取结构化信息以构建知识图谱。这些提取的图谱通过LMM连接缺失模态生成与排序模块,从而实现对缺失模态的高质量填补。在通用领域和医学领域的实验结果表明,我们的方法在包括OOD泛化在内的各项任务中均持续优于竞争方法。此外,我们的知识驱动生成与排序技术相较于直接使用LMM进行生成与排序的变体方法展现出优越性,其提供的洞见可能对其他领域的应用具有参考价值。