Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities. However, in real-world dynamic scenarios, the distribution of target data is always changing and different from the source data used to train the model, which leads to performance degradation. Common adaptation methods usually need source data, which could pose privacy issues or storage overheads. Therefore, test-time adaptation (TTA) methods are introduced to improve the performance of the model at inference time. Existing TTA methods are always based on probabilistic models and unimodal learning, and thus can not be applied to MSA which is often considered as a multimodal regression task. In this paper, we propose two strategies: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for multimodal sentiment analysis. The two strategies deal with the distribution shifts for MSA by enforcing consistency and minimizing empirical risk, respectively. Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. Our codes are available at https://github.com/zrguo/CASP.
翻译:多模态情感分析(MSA)是一个新兴的研究方向,旨在通过多种模态理解和识别人类情感或情绪。然而,在现实世界的动态场景中,目标数据的分布总是在变化,且不同于用于训练模型的源数据,这会导致模型性能下降。常见的适应方法通常需要源数据,这可能引发隐私问题或带来存储开销。因此,测试时适应(TTA)方法被引入,以在推理阶段提升模型性能。现有的TTA方法通常基于概率模型和单模态学习,因此无法适用于常被视为多模态回归任务的多模态情感分析。本文中,我们提出了两种策略:对比适应与稳定伪标签生成(CASP),用于多模态情感分析的测试时适应。这两种策略分别通过增强一致性和最小化经验风险来处理多模态情感分析中的分布偏移。大量实验表明,CASP在各种分布偏移设置下,使用不同骨干网络时,都能为模型性能带来显著且一致的提升,证明了其有效性和通用性。我们的代码公开于 https://github.com/zrguo/CASP。