Multimodal machine learning with missing modalities is an increasingly relevant challenge arising in various applications such as healthcare. This paper extends the current research into missing modalities to the low-data regime, i.e., a downstream task has both missing modalities and limited sample size issues. This problem setting is particularly challenging and also practical as it is often expensive to get full-modality data and sufficient annotated training samples. We propose to use retrieval-augmented in-context learning to address these two crucial issues by unleashing the potential of a transformer's in-context learning ability. Diverging from existing methods, which primarily belong to the parametric paradigm and often require sufficient training samples, our work exploits the value of the available full-modality data, offering a novel perspective on resolving the challenge. The proposed data-dependent framework exhibits a higher degree of sample efficiency and is empirically demonstrated to enhance the classification model's performance on both full- and missing-modality data in the low-data regime across various multimodal learning tasks. When only 1% of the training data are available, our proposed method demonstrates an average improvement of 6.1% over a recent strong baseline across various datasets and missing states. Notably, our method also reduces the performance gap between full-modality and missing-modality data compared with the baseline.
翻译:多模态机器学习中缺失模态的问题在医疗保健等各类应用中日益成为相关挑战。本文将缺失模态的现有研究拓展至低数据场景,即下游任务同时存在模态缺失与样本量有限的问题。这一情景具有特殊挑战性且实际应用中常见,因为获取完整模态数据与充足的标注训练样本通常成本高昂。我们提出使用检索增强型情境学习来解决这两个关键问题,通过释放Transformer情境学习能力的潜力。与现有主要属于参数化范式且通常需要充足训练样本的方法不同,我们的工作挖掘了可用完整模态数据的价值,为解决该挑战提供了全新视角。这种数据依赖型框架展现出更高的样本效率,实验证明能在低数据场景下提升分类模型在全模态与缺失模态数据上的性能,覆盖多种多模态学习任务。当仅有1%的训练数据可用时,我们提出的方法在多个数据集和缺失状态上较近期强基线平均提升6.1%。值得注意的是,与基线相比,我们的方法还缩小了全模态与缺失模态数据之间的性能差距。