This report provide a detailed description of the method that we explored and proposed in the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion through an artistic work with a comment. The dataset of this competition is ArtELingo, designed to encourage work on diversity across languages and cultures. The dataset has two main challenges, namely modal imbalance problem and language-cultural differences problem. In order to address this issue, we propose a simple yet effective approach called single-multi modal with Emotion-Cultural specific prompt(ECSP), which focuses on using the single modal message to enhance the performance of multimodal models and a well-designed prompt to reduce cultural differences problem. To clarify, our approach contains two main blocks: (1)XLM-R\cite{conneau2019unsupervised} based unimodal model and X$^2$-VLM\cite{zeng2022x} based multimodal model (2) Emotion-Cultural specific prompt. Our approach ranked first in the final test with a score of 0.627.
翻译:本报告详细描述了我们在WECIA情绪预测竞赛(EPC)中探索并提出的一种方法,该方法通过一件艺术作品及其评论来预测一个人的情绪。该竞赛使用的数据集为ArtELingo,旨在促进跨语言和跨文化多样性的研究。该数据集面临两大挑战,即模态不平衡问题和语言文化差异问题。为解决这些问题,我们提出了一种简单而有效的方法,称为带情绪文化特定提示(ECSP)的单多模态方法,该方法专注于利用单模态信息提升多模态模型的性能,并通过精心设计的提示来减少文化差异问题。具体而言,我们的方法包含两个主要模块:(1)基于XLM-R的单模态模型和基于X$^2$-VLM的多模态模型;(2)情绪文化特定提示。我们的方法在最终测试中以0.627的得分排名第一。