Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue, they fundamentally overlook the inherent disproportion in model classification ability, which serves as the primary cause of this phenomenon. In this paper, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities by incorporating the principle of boosting. Concretely, we first propose a sustained boosting algorithm in multimodal learning by simultaneously optimizing the classification and residual errors. Subsequently, we introduce an adaptive classifier assignment strategy to dynamically facilitate the classification performance of the weak modality. Furthermore, we theoretically analyze the convergence property of the cross-modal gap function, ensuring the effectiveness of the proposed boosting scheme. To this end, the classification ability of strong and weak modalities is expected to be balanced, thereby mitigating the imbalance issue. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SOTA) multimodal learning baselines. The source code is available at https://github.com/njustkmg/NeurIPS25-AUG.
翻译:多模态学习在实践中因模态不平衡问题而受到显著制约,导致性能表现欠佳。现有方法主要侧重于平衡不同模态的学习以应对此问题,但本质上忽视了模型分类能力的内在失衡,而后者正是导致该现象的主要原因。本文提出一种新颖的多模态学习方法,通过引入提升(boosting)原理动态平衡弱模态与强模态的分类能力。具体而言,我们首先提出一种通过同时优化分类误差与残差误差实现的多模态持续提升算法。随后,我们引入自适应分类器分配策略,以动态增强弱模态的分类性能。此外,我们从理论上分析了跨模态间隙函数的收敛特性,从而确保所提提升方案的有效性。最终,强模态与弱模态的分类能力有望达到平衡,从而缓解模态不平衡问题。在广泛使用的数据集上进行的实证实验表明,通过与多种最先进的多模态学习基线方法进行比较,我们的方法展现出优越性能。源代码发布于 https://github.com/njustkmg/NeurIPS25-AUG。