Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Developing effective multimodal fusion approaches has become increasingly essential in many real-world scenarios, such as health care and finance. The key challenge is how to preserve the feature expressiveness in each modality while learning cross-modal interactions. Previous approaches primarily focus on the cross-modal alignment, while over-emphasis on the alignment of marginal distributions of modalities may impose excess regularization and obstruct meaningful representations within each modality. The Dirichlet process (DP) mixture model is a powerful Bayesian non-parametric method that can amplify the most prominent features by its richer-gets-richer property, which allocates increasing weights to them. Inspired by this unique characteristic of DP, we propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal representation learning and cross-modal alignment. Specifically, we assume that each modality follows a mixture of multivariate Gaussian distributions and further adopt DP to calculate the mixture weights for all the components. This paradigm allows DP to dynamically allocate the contributions of features and select the most prominent ones, leveraging its richer-gets-richer property, thus facilitating multimodal feature fusion. Extensive experiments on several multimodal datasets demonstrate the superior performance of our model over other competitors. Ablation analysis further validates the effectiveness of DP in aligning modality distributions and its robustness to changes in key hyperparameters. Code is anonymously available at https://github.com/HKU-MedAI/DPMM.git

翻译：在许多现实场景（如医疗保健和金融）中，开发有效的多模态融合方法已变得日益重要。其核心挑战在于如何在学习跨模态交互的同时保持各模态的特征表达能力。先前方法主要关注跨模态对齐，但过度强调模态边缘分布的对齐可能施加过度的正则化，并阻碍各模态内有意义的表征。狄利克雷过程（DP）混合模型是一种强大的贝叶斯非参数方法，其“富者愈富”特性可通过为最显著特征分配递增的权重来增强这些特征。受DP这一独特性质的启发，我们提出了一种新的DP驱动的多模态学习框架，该框架能自动实现显著的模态内表征学习与跨模态对齐之间的最优平衡。具体而言，我们假设每个模态服从多元高斯分布的混合，并进一步采用DP计算所有分量的混合权重。该范式允许DP利用其“富者愈富”特性动态分配特征的贡献度并选择最显著的特征，从而促进多模态特征融合。在多个多模态数据集上的大量实验表明，我们的模型性能优于其他竞争方法。消融分析进一步验证了DP在模态分布对齐方面的有效性及其对关键超参数变化的鲁棒性。代码匿名发布于 https://github.com/HKU-MedAI/DPMM.git