Generative models for multimodal data permit the identification of latent factors that may be associated with important determinants of observed data heterogeneity. Common or shared factors could be important for explaining variation across modalities whereas other factors may be private and important only for the explanation of a single modality. Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private. In this work, we investigate their capability to reliably perform this disentanglement. In particular, we highlight a challenging problem setting where modality-specific variation dominates the shared signal. Taking a cross-modal prediction perspective, we demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation. Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.
翻译:多模态数据的生成模型允许识别与观测数据异质性的重要决定因素相关的潜在因子。共同或共享因子可能对解释跨模态变异至关重要,而其他因子则可能是私有的,仅对解释单一模态具有重要意义。多模态变分自编码器(如MVAE和MMVAE)是推断这些潜在因子并将共享变异与私有变异分离的天然选择。本研究探究了它们可靠执行这种解耦的能力,特别强调了一种具有挑战性的问题场景——其中模态特异性变异主导了共享信号。通过跨模态预测视角,我们揭示了现有模型的局限性,并提出了一种改进方案以增强其对模态特异性变异的鲁棒性。我们的发现得到了合成数据及多种真实多组学数据集实验的支持。