Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. sentimental polarity or categorical emotions), and overlook the regression tasks (e.g. sentiment strength or emotion intensity), which leads to poor performance in downstream tasks. The main reason is the lack of comprehensive affective instruction tuning datasets and evaluation benchmarks, which cover various affective classification and regression tasks. Moreover, although emotional information is useful for downstream tasks, existing downstream datasets lack high-quality and comprehensive affective annotations. In this paper, we propose EmoLLMs, the first series of open-sourced instruction-following LLMs for comprehensive affective analysis based on fine-tuning various LLMs with instruction data, the first multi-task affective analysis instruction dataset (AAID) with 234K data samples based on various classification and regression tasks to support LLM instruction tuning, and a comprehensive affective evaluation benchmark (AEB) with 14 tasks from various sources and domains to test the generalization ability of LLMs. We propose a series of EmoLLMs by fine-tuning LLMs with AAID to solve various affective instruction tasks. We compare our model with a variety of LLMs on AEB, where our models outperform all other open-sourced LLMs, and surpass ChatGPT and GPT-4 in most tasks, which shows that the series of EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools.
翻译:情感分析与情绪检测是自然语言处理中的重要研究课题,并对众多下游任务具有促进作用。随着大语言模型的广泛应用,研究者开始探索基于指令微调的LLM在情感分析领域的应用。然而,现有模型仅关注情感分类任务的单一维度(如情感极性或类别化情绪),忽略了回归任务(如情感强度或情绪强度),导致下游任务性能欠佳。其主要原因在于缺乏涵盖多种情感分类与回归任务的综合情感指令微调数据集及评估基准。此外,尽管情感信息对下游任务具有价值,现有下游数据集仍缺少高质量且全面的情感标注。本文提出了EmoLLMs——首个基于指令数据微调多种LLM、面向综合情感分析的开源指令跟随大语言模型系列;构建了首个多任务情感分析指令数据集AAID(含234K数据样本),覆盖多种分类与回归任务以支撑LLM指令微调;并建立了综合情感评估基准AEB(包含来自不同来源和领域的14项任务),用于检验LLM的泛化能力。通过采用AAID微调LLM,我们开发了EmoLLMs系列模型以解决各类情感指令任务。在AEB上,我们将本模型与多种LLM进行比较,结果显示本模型在所有开源LLM中表现最佳,并在大多数任务上超越ChatGPT与GPT-4。这表明EmoLLMs系列在情感分析任务上达到了ChatGPT级和GPT-4级的泛化能力,同时验证了本模型可作为情感标注工具。