Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm detection, which aims to evaluate models' generalizability when the word distribution is different in training and testing settings. Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases and negative samples with similar word biases. Subsequently, we devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features and alleviate the adverse effect of biased words. Extensive experiments show the superiority of the proposed framework.
翻译:尽管现有研究取得了显著进展,但当前主流的多模态讽刺检测方法仍过度依赖文本内容而非视觉信息。这不可避免地导致文本词汇与标签之间产生虚假关联,从而严重制约模型的泛化能力。为解决该问题,我们定义了分布外(OOD)多模态讽刺检测任务,旨在评估当训练集与测试集词分布不一致时模型的泛化性能。此外,我们提出了一种新颖的基于对比学习的去偏多模态讽刺检测框架,通过缓解文本偏见因素的有害影响实现对OOD场景的稳健泛化。具体而言,我们首先设计反事实数据增强方法,构造具有不同词汇偏见的正样本和具有相似词汇偏见的负样本;随后设计自适应去偏对比学习机制,使模型能够学习稳健的任务相关特征,并削弱偏见词汇的不利影响。大量实验证明了所提框架的优越性。