With the rapid development of multimedia, the shift from unimodal textual sentiment analysis to multimodal image-text sentiment analysis has obtained academic and industrial attention in recent years. However, multimodal sentiment analysis is affected by unimodal data bias, e.g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classification. In this paper, we propose a novel CounterFactual Multimodal Sentiment Analysis framework (CF-MSA) using causal counterfactual inference to construct multimodal sentiment causal inference. CF-MSA mitigates the direct effect from unimodal bias and ensures heterogeneity across modalities by differentiating the treatment variables between modalities. In addition, considering the information complementarity and bias differences between modalities, we propose a new optimisation objective to effectively integrate different modalities and reduce the inherent bias from each modality. Experimental results on two public datasets, MVSA-Single and MVSA-Multiple, demonstrate that the proposed CF-MSA has superior debiasing capability and achieves new state-of-the-art performances. We will release the code and datasets to facilitate future research.
翻译:随着多媒体的快速发展,从单模态文本情感分析转向多模态图文情感分析近年来获得了学术界与工业界的关注。然而,多模态情感分析受到单模态数据偏差的影响,例如文本情感因显式情感语义而产生误导,导致最终情感分类准确率较低。本文提出一种新颖的反事实多模态情感分析框架(CF-MSA),利用因果反事实推理构建多模态情感因果推断。CF-MSA通过区分模态间的处理变量,减轻单模态偏差的直接效应,并确保跨模态的异质性。此外,考虑到模态间的信息互补性与偏差差异,我们提出一种新的优化目标,以有效整合不同模态并减少各模态的固有偏差。在MVSA-Single和MVSA-Multiple两个公开数据集上的实验结果表明,所提出的CF-MSA具有优异的去偏能力,并取得了新的最先进性能。我们将公开代码与数据集以促进未来研究。