Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue, they fundamentally overlook the inherent disproportion in model classification ability, which serves as the primary cause of this phenomenon. In this paper, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities by incorporating the principle of boosting. Concretely, we first propose a sustained boosting algorithm in multimodal learning by simultaneously optimizing the classification and residual errors. Subsequently, we introduce an adaptive classifier assignment strategy to dynamically facilitate the classification performance of the weak modality. Furthermore, we theoretically analyze the convergence property of the cross-modal gap function, ensuring the effectiveness of the proposed boosting scheme. To this end, the classification ability of strong and weak modalities is expected to be balanced, thereby mitigating the imbalance issue. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SOTA) multimodal learning baselines. The source code is available at https://github.com/njustkmg/NeurIPS25-AUG.
翻译:多模态学习在实践中受到模态不平衡的显著制约,导致性能欠佳。现有方法主要侧重于平衡不同模态的学习以解决此问题,但它们从根本上忽略了模型分类能力的内在失衡,而这正是导致该现象的主要原因。本文提出一种新颖的多模态学习方法,通过引入提升(boosting)原理,动态平衡弱模态与强模态的分类能力。具体而言,我们首先通过同时优化分类误差与残差误差,提出一种用于多模态学习的持续提升算法。随后,我们引入一种自适应分类器分配策略,以动态提升弱模态的分类性能。此外,我们从理论上分析了跨模态差距函数的收敛性质,确保了所提提升方案的有效性。最终,强模态与弱模态的分类能力有望达到平衡,从而缓解不平衡问题。在广泛使用的数据集上的实证实验表明,通过与多种先进(SOTA)多模态学习基线进行比较,我们的方法具有优越性。源代码可在 https://github.com/njustkmg/NeurIPS25-AUG 获取。