This paper addresses the missing-modality challenge in multi-modal learning by introducing Unsupervised Learning for Missing Modalities in Multi-Modal Learning (UL4M4), a flexible framework that imputes missing feature embeddings in a task-independent manner before supervised prediction. We propose modality-specific normalization and a novel partial-modality distance metric to enable fair clustering of incomplete observations, capturing cross-modal structures while preserving scale-invariance across varying dimensionalities and modality counts. Cluster centers from this unsupervised stage guide an iterative greedy imputation process for any missing modalities during training or inference, supporting arbitrary numbers of modalities and arbitrary missing patterns per sample. The imputation module is lightweight, uses frozen encoders, and decouples from the downstream task, allowing easy integration with any fusion/prediction architecture. Extensive experiments under diverse and highly incomplete regimes demonstrate UL4M4's robustness, achieving, to the best of our knowledge, the first consistent F1-Micro scores above 0.7 on challenging missing configurations even when more than 50\% of modality slots are missing. Results are also stable across cluster sizes and significantly outperform state-of-the-art baselines. Code is available here: https://github.com/h-ismkhan/Multimodal-Learning-with-Missing-Modalities-via-Unsupervised-Learning.
翻译:本文针对多模态学习中的模态缺失问题,提出了一种灵活框架——针对多模态学习中缺失模态的无监督学习方法(UL4M4),该框架能在监督预测前以任务无关的方式填补缺失的特征嵌入。我们提出了模态特异性归一化及一种新颖的部分模态距离度量,以实现对不完整观测的公平聚类,在捕获跨模态结构的同时,保持跨不同维度和模态数量的尺度不变性。该无监督阶段的聚类中心指导了训练或推理过程中任意缺失模态的迭代贪婪填补过程,支持任意数量的模态及每个样本的任意缺失模式。填补模块轻量化,使用冻结编码器,并与下游任务解耦,便于与任意融合/预测架构集成。在多种高度不完整场景下的广泛实验表明,UL4M4具有鲁棒性,据我们所知,即使超过50%的模态槽位缺失,其在具有挑战性的缺失配置下首次实现了持续高于0.7的F1微观分数。结果在不同聚类规模下保持稳定,且显著优于当前最先进的基线模型。代码已开源:https://github.com/h-ismkhan/Multimodal-Learning-with-Missing-Modalities-via-Unsupervised-Learning