Stance detection is a challenging task that aims to identify public opinion from social media platforms with respect to specific targets. Previous work on stance detection largely focused on pure texts. In this paper, we study multi-modal stance detection for tweets consisting of texts and images, which are prevalent in today's fast-growing social media platforms where people often post multi-modal messages. To this end, we create five new multi-modal stance detection datasets of different domains based on Twitter, in which each example consists of a text and an image. In addition, we propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT), where target information is leveraged to learn multi-modal stance features from textual and visual modalities. Experimental results on our five benchmark datasets show that the proposed TMPT achieves state-of-the-art performance in multi-modal stance detection.
翻译:立场检测是一项具有挑战性的任务,旨在从社交媒体平台中识别针对特定目标的公众观点。以往关于立场检测的研究主要集中于纯文本。本文研究由文本和图像组成的推文的多模态立场检测,这在当今快速发展的社交媒体平台中十分普遍,人们经常发布多模态信息。为此,我们基于Twitter创建了五个不同领域的新多模态立场检测数据集,其中每个样本包含一段文本和一张图像。此外,我们提出了一种简单而有效的目标导向多模态提示微调框架(TMPT),该框架利用目标信息从文本和视觉模态中学习多模态立场特征。在我们五个基准数据集上的实验结果表明,所提出的TMPT在多模态立场检测中取得了最先进的性能。