Multimodal learning can complete the picture of information extraction by uncovering key dependencies between data sources. However, current systems fail to fully leverage multiple modalities for optimal performance. This has been attributed to modality competition, where modalities strive for training resources, leaving some underoptimized. We show that current balancing methods struggle to train multimodal models that surpass even simple baselines, such as ensembles. This raises the question: how can we ensure that all modalities in multimodal training are sufficiently trained, and that learning from new modalities consistently improves performance? This paper proposes the Multimodal Competition Regularizer (MCR), a new loss component inspired by mutual information (MI) decomposition designed to prevent the adverse effects of competition in multimodal training. Our key contributions are: 1) Introducing game-theoretic principles in multimodal learning, where each modality acts as a player competing to maximize its influence on the final outcome, enabling automatic balancing of the MI terms. 2) Refining lower and upper bounds for each MI term to enhance the extraction of task-relevant unique and shared information across modalities. 3) Suggesting latent space permutations for conditional MI estimation, significantly improving computational efficiency. MCR outperforms all previously suggested training strategies and is the first to consistently improve multimodal learning beyond the ensemble baseline, clearly demonstrating that combining modalities leads to significant performance gains on both synthetic and large real-world datasets.
翻译:多模态学习通过揭示数据源之间的关键依赖关系,能够完善信息提取的图景。然而,当前系统未能充分利用多种模态以实现最优性能。这被归因于模态竞争,即各模态争夺训练资源,导致某些模态未被充分优化。我们证明,当前的平衡方法难以训练出超越简单基线(如集成模型)的多模态模型。这引出了一个关键问题:如何确保多模态训练中的所有模态都得到充分训练,并且从新模态中学习能够持续提升性能?本文提出了多模态竞争正则化器(MCR),这是一种受互信息(MI)分解启发的新型损失组件,旨在防止多模态训练中竞争带来的不利影响。我们的主要贡献包括:1)将博弈论原理引入多模态学习,其中每个模态作为参与者,竞争最大化其对最终结果的影响,从而实现互信息项的自动平衡。2)细化每个互信息项的上下界,以增强跨模态任务相关独特信息和共享信息的提取。3)提出用于条件互信息估计的潜在空间置换方法,显著提高了计算效率。MCR 在性能上超越了所有先前提出的训练策略,并且首次在多模态学习中持续超越集成基线,这清楚地表明,在合成数据集和大型真实数据集上,结合多种模态能带来显著的性能提升。