Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.
翻译:多模态情感分析(MSA)旨在通过多模态数据理解人类情感。大多数MSA研究基于模态完整性的假设。然而,在实际应用中,一些实际因素会导致不确定的模态缺失,从而显著降低模型性能。为此,我们提出了一种面向不确定缺失模态下MSA任务的解耦关联知识蒸馏(CorrKD)框架。具体而言,我们提出了一种样本级对比蒸馏机制,该机制传递包含跨样本关联的全面知识以重建缺失语义。此外,引入了一种类别引导的原型蒸馏机制,利用类别原型捕捉跨类别关联,以对齐特征分布并生成有利的联合表示。最后,我们设计了一种响应解耦一致性蒸馏策略,通过响应解耦和互信息最大化来优化学生网络的情感决策边界。在三个数据集上的综合实验表明,与多种基线方法相比,我们的框架能够实现显著的性能提升。