Multimodal models often converge to a dominant-modality solution, in which a stronger, faster-converging modality overshadows weaker ones. This modality imbalance causes suboptimal performance. Existing methods attempt to balance different modalities by reweighting gradients or losses. However, they overlook the fact that each modality has finite information capacity. In this work, we propose IIBalance, a multimodal learning framework that aligns the modality contributions with Intrinsic Information Budgets (IIB). We propose a task-grounded estimator of each modality's IIB, transforming its capacity into a global prior over modality contributions. Anchored by the highest-budget modality, we design a prototype-based relative alignment mechanism that corrects semantic drift only when weaker modalities deviate from their budgeted potential, rather than forcing imitation. During inference, we propose a probabilistic gating module that integrates the global budgets with sample-level uncertainty to generate calibrated fusion weights. Experiments on three representative benchmarks demonstrate that IIBalance consistently outperforms state-of-the-art balancing methods and achieves better utilization of complementary modality cues. Our code is available at: https://github.com/XiongZechang/IIBalance.
翻译:多模态模型常收敛于主导模态解,即较强且收敛较快的模态会压制较弱模态。这种模态不平衡导致次优性能。现有方法试图通过重新加权梯度或损失来平衡不同模态,但忽视了每个模态具有有限信息容量的本质。本文提出IIBalance——一种基于内在信息预算(IIB)对齐模态贡献的多模态学习框架。我们提出任务驱动的模态IIB估计方法,将其容量转化为模态贡献的全局先验。以最高预算模态为锚点,我们设计了基于原型的关系对齐机制,仅在较弱模态偏离其预算潜力时修正语义漂移,而非强制模仿。在推理阶段,我们提出概率门控模块,将全局预算与样本级不确定性相结合,生成校准后的融合权重。在三个代表性基准上的实验表明,IIBalance持续优于最先进的平衡方法,并能更好地利用互补模态线索。我们的代码开源于:https://github.com/XiongZechang/IIBalance。