Multimodal machine learning with missing modalities is an increasingly relevant challenge arising in various applications such as healthcare. This paper extends the current research into missing modalities to the low-data regime, i.e., a downstream task has both missing modalities and limited sample size issues. This problem setting is particularly challenging and also practical as it is often expensive to get full-modality data and sufficient annotated training samples. We propose to use retrieval-augmented in-context learning to address these two crucial issues by unleashing the potential of a transformer's in-context learning ability. Diverging from existing methods, which primarily belong to the parametric paradigm and often require sufficient training samples, our work exploits the value of the available full-modality data, offering a novel perspective on resolving the challenge. The proposed data-dependent framework exhibits a higher degree of sample efficiency and is empirically demonstrated to enhance the classification model's performance on both full- and missing-modality data in the low-data regime across various multimodal learning tasks. When only 1% of the training data are available, our proposed method demonstrates an average improvement of 6.1% over a recent strong baseline across various datasets and missing states. Notably, our method also reduces the performance gap between full-modality and missing-modality data compared with the baseline.
翻译:缺失模态的多模态机器学习在医疗保健等应用中日益成为一项重要的挑战。本文将当前对缺失模态的研究拓展至低数据场景,即下游任务同时面临模态缺失和样本量有限的问题。该问题设定尤为棘手且具有现实意义,因为获取全模态数据和充足的有标注训练样本往往代价高昂。我们提出利用检索增强的上下文学习来解决这两个关键问题,通过释放Transformer模型上下文学习能力的潜力。与现有主要属于参数化范式且通常需要足够训练样本的方法不同,本工作挖掘了可用全模态数据的价值,为解决这一挑战提供了新颖视角。所提出的数据驱动框架展现出更高的样本效率,并在实验中被证明能在低数据场景下增强分类模型在全模态和缺失模态数据上的性能。当仅有1%训练数据可用时,本方法在各数据集和缺失状态下相较于近期强基线实现了平均6.1%的提升。值得注意的是,与基线相比,本方法还缩小了全模态数据与缺失模态数据之间的性能差距。