Recent advancements in multimodal Variational AutoEncoders (VAEs) have highlighted their potential for modeling complex data from multiple modalities. However, many existing approaches use relatively straightforward aggregating schemes that may not fully capture the complex dynamics present between different modalities. This work introduces a novel multimodal VAE that incorporates a Markov Random Field (MRF) into both the prior and posterior distributions. This integration aims to capture complex intermodal interactions more effectively. Unlike previous models, our approach is specifically designed to model and leverage the intricacies of these relationships, enabling a more faithful representation of multimodal data. Our experiments demonstrate that our model performs competitively on the standard PolyMNIST dataset and shows superior performance in managing complex intermodal dependencies in a specially designed synthetic dataset, intended to test intricate relationships.
翻译:近年来,多模态变分自编码器(VAEs)的进展凸显了其在建模来自多个模态的复杂数据方面的潜力。然而,许多现有方法使用相对简单的聚合方案,可能无法完全捕获不同模态之间存在的复杂动态。本研究提出了一种新颖的多模态VAE,它将马尔可夫随机场(MRF)整合到先验分布和后验分布中。这种整合旨在更有效地捕获模态间复杂的相互作用。与先前模型不同,我们的方法专门设计用于建模并利用这些关系的复杂性,从而能够更忠实地表示多模态数据。我们的实验表明,我们的模型在标准PolyMNIST数据集上表现具有竞争力,并且在管理一个专门设计的、用于测试复杂关系的合成数据集中的复杂模态间依赖性方面,显示出更优越的性能。