Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 1% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on five different multimodal tasks across seven datasets. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
翻译:多模态学习旨在利用来自多个来源的数据来提升下游任务的整体性能。人们期望数据中的冗余性能使多模态系统对某些相关模态中缺失或损坏的观测具有鲁棒性。然而,我们观察到,如果在测试时一个或多个模态缺失,现有多种多模态网络的性能会显著下降。为实现对缺失模态的鲁棒性,我们提出了一种简单且参数高效的预训练多模态网络适应方法。具体而言,我们利用中间特征的调制来补偿缺失的模态。我们证明,这种适应可以部分弥补因模态缺失导致的性能下降,并且在某些情况下优于为可用模态组合独立训练的专用网络。所提出的适应方法所需参数量极少(例如少于总参数的1%),并且适用于广泛的模态组合和任务。我们进行了一系列实验,在七个数据集上的五种不同多模态任务中,突显了所提方法对模态缺失的鲁棒性。我们提出的方法在各种任务和数据集上展现出良好的通用性,并且在应对模态缺失的鲁棒多模态学习方面优于现有方法。