Multimodal sentiment analysis (MSA) systems leverage information from different modalities to predict human sentiment intensities. Incomplete modality is an important issue that may cause a significant performance drop in MSA systems. By generative imputation, i.e., recovering the missing data from available data, systems may achieve robust performance but will lead to high computational costs. This paper introduces a knowledge distillation method, called `Multi-Modal Contrastive Knowledge Distillation' (MM-CKD), to address the issue of incomplete modality in video sentiment analysis with lower computation cost, as a novel non-imputation-based method. We employ Multi-view Supervised Contrastive Learning (MVSC) to transfer knowledge from a teacher model to student models. This approach not only leverages cross-modal knowledge but also introduces cross-sample knowledge with supervision, jointly improving the performance of both teacher and student models through online learning. Our method gives competitive results with significantly lower computational costs than state-of-the-art imputation-based methods.
翻译:多模态情感分析系统利用不同模态的信息来预测人类情感强度。模态不完整是导致多模态情感分析系统性能显著下降的重要问题。通过生成式填补(即从可用数据中恢复缺失数据),系统可获得鲁棒性能,但会导致高昂计算成本。本文提出一种称为"多模态对比知识蒸馏"的知识蒸馏方法,以较低计算成本解决视频情感分析中的模态不完整问题,这是一种新型的非填补式方法。我们采用多视角监督对比学习将知识从教师模型迁移到学生模型。该方法不仅利用跨模态知识,还引入带监督的跨样本知识,通过在线学习共同提升教师模型和学生模型的性能。与最先进的基于填补的方法相比,我们的方法以显著降低的计算成本获得了具有竞争力的结果。