Multimodal Sentiment Analysis (MSA) with missing modalities has attracted increasing attention recently. While current Transformer-based methods leverage dense text information to maintain model robustness, their quadratic complexity hinders efficient long-range modeling and multimodal fusion. To this end, we propose a novel and efficient Text-enhanced Fusion Mamba (TF-Mamba) framework for robust MSA with missing modalities. Specifically, a Text-aware Modality Enhancement (TME) module aligns and enriches non-text modalities, while reconstructing the missing text semantics. Moreover, we develop Text-based Context Mamba (TC-Mamba) to capture intra-modal contextual dependencies under text collaboration. Finally, Text-guided Query Mamba (TQ-Mamba) queries text-guided multimodal information and learns joint representations for sentiment prediction. Extensive experiments on three MSA datasets demonstrate the effectiveness and efficiency of the proposed method under missing modality scenarios. Our code is available at https://github.com/codemous/TF-Mamba.
翻译:模态缺失的多模态情感分析近年来受到越来越多的关注。尽管当前基于Transformer的方法利用密集的文本信息来保持模型的鲁棒性,但其二次复杂度阻碍了高效的长程建模与多模态融合。为此,我们提出了一种新颖且高效的文本增强融合Mamba框架,用于处理模态缺失的鲁棒多模态情感分析。具体而言,一个文本感知的模态增强模块对齐并丰富了非文本模态,同时重建了缺失的文本语义。此外,我们开发了基于文本的上下文Mamba模块,以捕获文本协作下的模态内上下文依赖关系。最后,文本引导的查询Mamba模块查询文本引导的多模态信息,并学习用于情感预测的联合表示。在三个多模态情感分析数据集上的大量实验证明了所提方法在模态缺失场景下的有效性和高效性。我们的代码可在 https://github.com/codemous/TF-Mamba 获取。