Multimodal systems are vulnerable to partial or complete loss of input channels at deployment, which undermines reliability in real-world settings. This paper presents ModalImmune, a training framework that enforces modality immunity by intentionally and controllably collapsing selected modality information during training so the model learns joint representations that are robust to destructive modality influence. The framework combines a spectrum-adaptive collapse regularizer, an information-gain guided controller for targeted interventions, curvature-aware gradient masking to stabilize destructive updates, and a certified Neumann-truncated hyper-gradient procedure for automatic meta-parameter adaptation. Empirical evaluation on standard multimodal benchmarks demonstrates that ModalImmune improves resilience to modality removal and corruption while retaining convergence stability and reconstruction capacity.
翻译:多模态系统在部署时容易因部分或全部输入通道丢失而受影响,这削弱了其在真实场景中的可靠性。本文提出ModalImmune框架,通过有意识且可控地破坏训练过程中的特定模态信息,使模型学习对破坏性模态影响具有鲁棒性的联合表征。该框架结合了频谱自适应破坏正则化器、基于信息增益引导的定向干预控制器、用于稳定破坏性更新的曲率感知梯度掩码,以及经认证的诺依曼截断超梯度过程以实现自动元参数自适应。在标准多模态基准上的实验评估表明,ModalImmune在保持收敛稳定性和重建能力的同时,显著提升了模型对模态移除和破坏的鲁棒性。