We study multimodal learning under missing modalities, with particular motivation from bioscience applications in which heterogeneous modalities are often only partially available when decisions need to be made. We propose Latent World Recovery (LWR), a framework built on two key ideas: (i) modality-specific embeddings from different modalities are aligned in a shared latent space, and (ii) a unified representation is constructed by fusing only the embeddings of the modalities that are actually available at both training and inference time. Rather than imputing missing modalities or requiring a fixed modality set, LWR treats each modality as a partial perception of an underlying latent state and performs availability-aware representation learning directly from the observed modalities. This combination of neighbor-based latent alignment and availability-aware modality fusion enables robust multimodal prediction under partial observation, while avoiding error propagation from explicit reconstruction of missing modalities. We evaluate the proposed framework on real-world incomplete multi-omics benchmarks and demonstrate that it provides an effective approach to downstream tasks such as cancer phenotype classification and survival prediction.
翻译:我们研究了缺失模态下的多模态学习问题,其核心动机源于生物科学应用——当需要做出决策时,异质模态通常只能部分获取。我们提出隐空间世界恢复(LWR)框架,该框架建立在两个关键思想上:(i)不同模态的模态特定嵌入在共享隐空间中对齐;(ii)通过融合仅在训练和推理时实际可用的模态嵌入来构建统一表示。LWR无需填补缺失模态或要求固定模态集合,而是将各模态视为潜在隐状态的部分感知,直接基于观测模态进行可用性感知表示学习。这种基于邻居的隐空间对齐与可用性感知模态融合的组合,能够在部分观测条件下实现鲁棒的多模态预测,同时避免因显式重建缺失模态而导致的误差传播。我们在真实世界的不完整多组学基准数据集上评估了该框架,并证明其为癌症表型分类和生存预测等下游任务提供了有效解决方案。