Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
翻译:多模态学习旨在利用来自多个源的数据提升下游任务的整体性能。数据中的冗余特性有望使多模态系统在部分相关模态缺失或损坏时仍保持鲁棒性。然而,我们观察到多个现存多模态网络在测试阶段缺失一个或多个模态时,其性能会显著下降。为增强对缺失模态的鲁棒性,我们提出一种简单且参数高效的预训练多模态网络适配方法。具体而言,我们利用中间特征的调制来补偿缺失模态。实验表明,这种适配方法能够部分弥补模态缺失导致的性能下降,在某些情况下甚至优于为现有模态组合独立训练的专用网络。所提出的适配方法仅需极少量参数(例如少于总参数的0.7%),并可适用于广泛的模态组合与任务。我们针对多模态语义分割、多模态材料分割及多模态情感分析任务,在5个不同数据集上开展了一系列实验,以突出所提方法对缺失模态的鲁棒性。该方法展现了跨任务与数据集的广泛适用性,并在缺失模态下的鲁棒多模态学习方面超越了现有方法。