Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.
翻译:夸张是一种常见的语言现象。夸张检测是理解人类表达的重要组成部分。已有若干关于夸张检测的研究,但大多仅关注文本模态。然而,随着社交媒体的发展,人们可以通过多种模态(包括文本、图像、视频等)生成夸张表达。本文聚焦多模态夸张检测。我们基于微博(中国社交媒体)构建了一个多模态检测数据集(该数据集将向社区公开发布),并对其开展多项研究。我们将微博中的文本与图像视为两种模态,探究文本与图像在夸张检测中的作用。同时,评估了多种预训练多模态编码器在该下游任务上的性能。此外,由于该数据集涵盖五个不同主题,我们还评估了不同模型的跨域表现。这些研究可作为基准,并指出多模态夸张检测未来研究的方向。