Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Unimodal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches.
翻译:多模态学习整合来自不同感官模态的数据,在人工智能中发挥着关键作用。然而,现有方法在应对某些模态在多模态学习过程中主导性过强的问题时往往表现欠佳,导致学习效果不理想。为解决这一挑战,我们提出MLA(交替单模态自适应多模态学习)。MLA将传统的联合多模态学习过程重构为交替单模态学习过程,从而最小化模态间的相互干扰。同时,通过一个持续跨模态优化的共享头来捕获跨模态交互,该共享头采用梯度修正机制进行优化控制,以防止其丢失先前获取的知识。在推理阶段,MLA采用基于测试时不确定性的模型融合机制来整合多模态信息。我们在五个涵盖完整模态与缺失模态场景的多样化数据集上进行了大量实验,结果表明MLA相较于现有竞争方法具有显著优越性。