A Wide Evaluation of ChatGPT on Affective Computing Tasks

With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.

翻译：随着基础模型的发展，一种新的人工智能范式应运而生：通过简单使用通用基础模型配合提示词来解决问题，而非为每个问题单独训练机器学习模型。这类模型已展现出解决初始训练范围之外问题的涌现能力，但目前对其有效性的研究仍十分有限。本研究广泛探究了ChatGPT模型（即GPT-4和GPT-3.5）在13项情感计算任务中的能力，具体包括：方面提取、方面情感极性分类、观点提取、情感分析、情感强度排序、情绪强度排序、自杀倾向检测、毒性检测、幸福感评估、参与度测量、人格评估、讽刺检测以及主观性检测。我们引入了一个框架，通过将回归问题（如强度排序）建模为成对排序分类任务，以评估ChatGPT模型的表现。我们将ChatGPT与更传统的自然语言处理方法（如端到端循环神经网络和Transformer）进行了比较。结果表明，ChatGPT模型在广泛的情感计算任务中展现出涌现能力，其中GPT-3.5尤其是GPT-4在多个任务上表现强劲，特别是在涉及情感、情绪或毒性的任务中。然而，对于隐式信号任务（如参与度测量和主观性检测），ChatGPT模型表现不足。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/