Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality dominance which widely exists in the missing-modality scenario. In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities. Specifically, GMD removes the conflicted gradient components from different modalities to achieve this decoupling, significantly improving the performance. In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing (DS) framework which can adaptively switch on/off the network parameters based on whether one modality is available. We conduct extensive experiments on three popular multimodal benchmarks, including BraTS 2018 for medical segmentation, CMU-MOSI, and CMU-MOSEI for sentiment analysis. The results show that our method can significantly outperform the competitors, showing the effectiveness of the proposed solutions. Our code is released here: https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling.
翻译:多模态学习在处理不完整输入数据(缺失模态)时具有实用价值且充满挑战。本研究深入分析了这一挑战,发现模态主导性对模型训练具有显著的负面影响,严重降低了缺失模态场景下的性能。受Grad-CAM启发,我们引入了一种新型指标——梯度,用于监测并减少在缺失模态场景中普遍存在的模态主导现象。基于该指标,我们提出了一种梯度引导的模态解耦方法(GMD),以降低对主导模态的依赖。具体而言,GMD通过移除不同模态间的冲突梯度分量实现解耦,显著提升了模型性能。此外,为灵活处理模态不完整数据,我们设计了一种参数高效的动态共享框架(DS),该框架可根据模态可用性自适应地切换网络参数的启用状态。我们在三个主流多模态基准数据集(包括医学分割任务BraTS 2018、情感分析任务CMU-MOSI和CMU-MOSEI)上开展了大量实验。结果表明,我们的方法显著优于现有竞争者,验证了所提方案的有效性。代码已开源:https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling