Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.
翻译:摘要:多模态学习利用异构模态间的互补信息。不同模态的信息量会随样本和训练阶段发生显著变化。现有的大多数多模态课程学习策略通常假设样本的相对复杂度在整个训练过程中保持不变,因此无法适应模型的演化过程。我们提出SPICE(基于协同与部分信息的课程演化),一种用于多模态交互学习的新型渐进式课程框架。在部分信息分解(PID)理论的指导下,我们的方法将多模态交互分解为冗余、独特和协同三种信息成分,从而实现对样本复杂度的可解释动态表征。基于这种分解,我们设计了一种在训练过程中持续演化的渐进式课程,使模型能够依次学习共享的跨模态线索、模态特定模式,最终掌握复杂的协同交互。为适应模型演化,我们利用从单模态和多模态预测中提取的PID信息估计值,实时优化样本排序。在多个多模态基准上的实验表明,该方法相较于传统训练和现有最优基线均取得了一致性提升,验证了PID信息分解与自适应样本排序在多模态课程学习中的有效性。