Sentiment analysis is a crucial task that aims to understand people's emotional states and predict emotional categories based on multimodal information. It consists of several subtasks, such as emotion recognition in conversation (ERC), aspect-based sentiment analysis (ABSA), and multimodal sentiment analysis (MSA). However, unifying all subtasks in sentiment analysis presents numerous challenges, including modality alignment, unified input/output forms, and dataset bias. To address these challenges, we propose a Task-Specific Prompt method to jointly model subtasks and introduce a multimodal generative framework called UniSA. Additionally, we organize the benchmark datasets of main subtasks into a new Sentiment Analysis Evaluation benchmark, SAEval. We design novel pre-training tasks and training methods to enable the model to learn generic sentiment knowledge among subtasks to improve the model's multimodal sentiment perception ability. Our experimental results show that UniSA performs comparably to the state-of-the-art on all subtasks and generalizes well to various subtasks in sentiment analysis.
翻译:情感分析是一项关键任务,旨在基于多模态信息理解人们的情感状态并预测情感类别。该任务包含若干子任务,例如对话情感识别(ERC)、基于方面的情感分析(ABSA)和多模态情感分析(MSA)。然而,统一情感分析中的所有子任务面临诸多挑战,包括模态对齐、统一输入/输出形式以及数据集偏差。为应对这些挑战,我们提出了一种任务特定提示方法,用于联合建模子任务,并引入了一个名为UniSA的多模态生成式框架。此外,我们将主要子任务的基准数据集整合为一个新的情感分析评估基准——SAEval。我们设计了新颖的预训练任务和训练方法,使模型能够学习子任务之间的通用情感知识,从而提升模型的多模态情感感知能力。实验结果表明,UniSA在所有子任务上的性能与现有最优方法相当,并且在情感分析的各类子任务中具有良好的泛化能力。