Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm detection, which aims to evaluate models' generalizability when the word distribution is different in training and testing settings. Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases and negative samples with similar word biases. Subsequently, we devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features and alleviate the adverse effect of biased words. Extensive experiments show the superiority of the proposed framework.
翻译:尽管现有工作取得了值得称赞的成果,但当前主流的多模态讽刺检测研究更依赖文本内容而非视觉信息。这不可避免地导致文本词汇与标签之间产生虚假关联,从而严重阻碍了模型的泛化能力。为解决该问题,我们定义了分布外(OOD)多模态讽刺检测任务,旨在评估训练与测试阶段词分布不同时模型的泛化性能。此外,我们提出了一种基于对比学习的新型去偏多模态讽刺检测框架,旨在缓解文本偏置因素对鲁棒OOD泛化的有害影响。具体而言,我们首先设计反事实数据增强,构建具有不同词语偏置的正样本和具有相似词语偏置的负样本。随后,我们开发了一种自适应去偏对比学习机制,使模型能够学习鲁棒的任务相关特征并减轻偏置词的不利影响。大量实验证明了所提框架的优越性。