Due to the notorious modality imbalance problem, multimodal learning (MML) leads to the phenomenon of optimization imbalance, thus struggling to achieve satisfactory performance. Recently, some representative methods have been proposed to boost the performance, mainly focusing on adaptive adjusting the optimization of each modality to rebalance the learning speed of dominant and non-dominant modalities. To better facilitate the interaction of model information in multimodal learning, in this paper, we propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE). Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase. Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase. Therefore, we can improve the generalization ability and alleviate the modality forgetting phenomenon simultaneously for multimodal learning. Extensive experiments on widely used datasets demonstrate that our proposed method can outperform various state-of-the-art baselines to achieve the best performance.
翻译:由于模态不平衡这一普遍问题,多模态学习常面临优化失衡现象,导致难以获得理想性能。近期若干代表性方法主要通过自适应调整各模态的优化过程,以重新平衡主导模态与非主导模态的学习速度,从而提升模型表现。为更好地促进多模态学习中模型信息的交互,本文提出一种称为模态感知交互增强的新型多模态学习方法。具体而言,我们首先采用基于锐度感知最小化的优化策略,在前向传播阶段平滑学习目标;随后借助锐度感知最小化的几何特性,提出梯度修正策略以在反向传播阶段施加不同模态间的相互影响。通过这种方式,我们能够同步提升多模态学习的泛化能力并缓解模态遗忘现象。在多个广泛使用的数据集上的大量实验表明,本方法能够超越各类先进基线模型,取得最优性能。