Multimodal models trained on complete modality data often exhibit a substantial decrease in performance when faced with imperfect data containing corruptions or missing modalities. To address this robustness challenge, prior methods have explored various approaches from aspects of augmentation, consistency or uncertainty, but these approaches come with associated drawbacks related to data complexity, representation, and learning, potentially diminishing their overall effectiveness. In response to these challenges, this study introduces a novel approach known as the Redundancy-Adaptive Multimodal Learning (RAML). RAML efficiently harnesses information redundancy across multiple modalities to combat the issues posed by imperfect data while remaining compatible with the complete modality. Specifically, RAML achieves redundancy-lossless information extraction through separate unimodal discriminative tasks and enforces a proper norm constraint on each unimodal feature representation. Furthermore, RAML explicitly enhances multimodal fusion by leveraging fine-grained redundancy among unimodal features to learn correspondences between corrupted and untainted information. Extensive experiments on various benchmark datasets under diverse conditions have consistently demonstrated that RAML outperforms state-of-the-art methods by a significant margin.
翻译:针对不完美数据(包含噪声或缺失模态)的挑战,在多模态完整数据上训练的模型常出现性能显著下降。现有方法从数据增强、一致性约束或不确定性建模等角度探索鲁棒性提升方案,但这些方法在数据复杂性、特征表征及学习机制层面存在局限,可能削弱整体效能。为此,本研究提出了一种名为冗余自适应多模态学习(RAML)的创新方法。该方法通过高效利用跨模态信息冗余应对不完美数据问题,同时保持与完整模态的兼容性。具体而言,RAML通过独立单模态判别任务实现无损冗余信息提取,并对各单模态特征表征施加合理的范数约束。同时,RAML通过挖掘单模态特征间的细粒度冗余信息,学习受损信息与纯净信息间的对应关系,从而显式增强多模态融合效果。在多种基准数据集及多样化条件下的广泛实验表明,RAML以显著优势超越现有最优方法。