In recent years, multimodal natural language processing, aimed at learning from diverse data types, has garnered significant attention. However, there needs to be more clarity when it comes to analysing multimodal tasks in multi-lingual contexts. While prior studies on sentiment analysis of tweets have predominantly focused on the English language, this paper addresses this gap by transforming an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process. Our work opens up new avenues for sentiment-related research within the research community. Additionally, we conduct baseline experiments utilising this augmented dataset and report the findings. Notably, our evaluations reveal that when comparing unimodal and multimodal configurations, using a sentiment-tuned large language model as a text encoder performs exceptionally well.
翻译:近年来,旨在从多种数据类型中学习多模态自然语言处理技术受到了广泛关注。然而,在多语言背景下分析多模态任务仍存在诸多不明确之处。尽管此前关于推文情感分析的研究主要聚焦于英语,本文通过一个简明的数据集整理流程,将现有的文本型推文情感数据集转化为多模态格式,从而填补了这一空白。我们的工作为研究社区内情感相关研究开辟了新途径。此外,我们利用该增强数据集进行了基线实验,并报告了实验结果。值得关注的是,评估结果显示,在单模态与多模态配置的对比中,采用情感调优的大语言模型作为文本编码器表现出色。