Multimodal Sentiment Analysis (MSA) integrates diverse modalities(text, audio, and video) to comprehensively analyze and understand individuals' emotional states. However, the real-world prevalence of incomplete data poses significant challenges to MSA, mainly due to the randomness of modality missing. Moreover, the heterogeneity issue in multimodal data has yet to be effectively addressed. To tackle these challenges, we introduce the Modality-Invariant Bidirectional Temporal Representation Distillation Network (MITR-DNet) for Missing Multimodal Sentiment Analysis. MITR-DNet employs a distillation approach, wherein a complete modality teacher model guides a missing modality student model, ensuring robustness in the presence of modality missing. Simultaneously, we developed the Modality-Invariant Bidirectional Temporal Representation Learning Module (MIB-TRL) to mitigate heterogeneity.
翻译:多模态情感分析(MSA)整合多种模态(文本、音频和视频)以全面分析与理解个体的情感状态。然而,现实世界中普遍存在的数据不完整问题对MSA构成了重大挑战,这主要源于模态缺失的随机性。此外,多模态数据中的异质性问题尚未得到有效解决。为应对这些挑战,我们提出了用于缺失多模态情感分析的模态不变双向时序表征蒸馏网络(MITR-DNet)。MITR-DNet采用蒸馏方法,其中完整的模态教师模型指导缺失模态的学生模型,确保在模态缺失情况下的鲁棒性。同时,我们开发了模态不变双向时序表征学习模块(MIB-TRL)以缓解异质性。