Multimodal Sentiment Analysis aims to integrate information from various modalities, such as audio, visual, and text, to make complementary predictions. However, it often struggles with irrelevant or misleading visual and auditory information. Most existing approaches typically treat the entire modality information (e.g., a whole image, audio segment, or text paragraph) as an independent unit for feature enhancement or denoising. They often suppress the redundant and noise information at the risk of losing critical information. To address this challenge, we propose MoLAN, a unified ModaLity-aware noise dynAmic editiNg framework. Specifically, MoLAN performs modality-aware blocking by dividing the features of each modality into multiple blocks. Each block is then dynamically assigned a distinct denoising strength based on its noise level and semantic relevance, enabling fine-grained noise suppression while preserving essential multimodal information. Notably, MoLAN is a unified and flexible framework that can be seamlessly integrated into a wide range of multimodal models. Building upon this framework, we further introduce MoLAN+, a new multimodal sentiment analysis approach. Experiments across five models and four datasets demonstrate the broad effectiveness of the MoLAN framework. Extensive evaluations show that MoLAN+ achieves the state-of-the-art performance. The code is publicly available at https://github.com/betterfly123/MoLAN-Framework.
翻译:多模态情感分析旨在整合来自音频、视觉和文本等多种模态的信息,以做出互补性预测。然而,该方法常常受到无关或误导性视觉与听觉信息的干扰。现有大多数方法通常将整个模态信息(例如整张图像、音频片段或文本段落)视为独立单元进行特征增强或去噪。它们往往以丢失关键信息为代价来抑制冗余和噪声信息。为应对这一挑战,我们提出了MoLAN,一种统一的模态感知噪声动态编辑框架。具体而言,MoLAN通过将每个模态的特征划分为多个块来执行模态感知分块。随后,每个块根据其噪声水平和语义相关性被动态分配不同的去噪强度,从而在保留重要多模态信息的同时实现细粒度的噪声抑制。值得注意的是,MoLAN是一个统一且灵活的框架,能够无缝集成到广泛的多模态模型中。基于此框架,我们进一步提出了MoLAN+,一种新的多模态情感分析方法。在五种模型和四个数据集上的实验证明了MoLAN框架的广泛有效性。大量评估表明,MoLAN+实现了最先进的性能。代码公开于https://github.com/betterfly123/MoLAN-Framework。