Multimodal Sentiment Analysis (MSA) integrates multiple modalities to infer human sentiment, but real-world noise often leads to missing or corrupted data. However, existing feature-disentangled methods struggle to handle the internal variations of heterogeneous information under uncertain missingness, making it difficult to learn effective multimodal representations from degraded modalities. To address this issue, we propose DERL, a Disentangled Expert Representation Learning framework for robust MSA. Specifically, DERL employs hybrid experts to adaptively disentangle multimodal inputs into orthogonal private and shared representation spaces. A multi-level reconstruction strategy is further developed to provide collaborative supervision, enhancing both the expressiveness and robustness of the learned representations. Finally, the disentangled features act as modality experts with distinct roles to generate importance-aware fusion results. Extensive experiments on two MSA benchmarks demonstrate that DERL outperforms state-of-the-art methods under various missing-modality conditions. For instance, our method achieves improvements of 2.47% in Acc-2 and 2.25% in MAE on MOSI under intra-modal missingness.
翻译:多模态情感分析通过整合多种模态来推断人类情感,但现实世界中的噪声常导致数据缺失或损坏。然而,现有的特征解耦方法难以处理不确定缺失情况下异构信息的内部变化,使得从退化模态中学习有效的多模态表示变得困难。为解决此问题,我们提出DERL,一种面向鲁棒多模态情感分析的解耦专家表示学习框架。具体而言,DERL采用混合专家模型,将多模态输入自适应地解耦至正交的私有与共享表示空间。我们进一步设计了多级重构策略以提供协同监督,从而增强所学表示的表达能力与鲁棒性。最终,解耦后的特征作为具有不同角色的模态专家,生成重要性感知的融合结果。在两个多模态情感分析基准数据集上的大量实验表明,DERL在多种模态缺失条件下均优于现有先进方法。例如,在MOSI数据集上,我们的方法在模态内缺失场景下,Acc-2指标提升了2.47%,MAE指标提升了2.25%。