With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.
翻译:随着基础模型的发展,一种新的人工智能范式应运而生:通过简单使用通用基础模型配合提示词来解决问题,而非为每个问题单独训练机器学习模型。这类模型已展现出解决初始训练范围之外问题的涌现能力,但目前对其有效性的研究仍十分有限。本研究广泛探究了ChatGPT模型(即GPT-4和GPT-3.5)在13项情感计算任务中的能力,具体包括:方面提取、方面情感极性分类、观点提取、情感分析、情感强度排序、情绪强度排序、自杀倾向检测、毒性检测、幸福感评估、参与度测量、人格评估、讽刺检测以及主观性检测。我们引入了一个框架,通过将回归问题(如强度排序)建模为成对排序分类任务,以评估ChatGPT模型的表现。我们将ChatGPT与更传统的自然语言处理方法(如端到端循环神经网络和Transformer)进行了比较。结果表明,ChatGPT模型在广泛的情感计算任务中展现出涌现能力,其中GPT-3.5尤其是GPT-4在多个任务上表现强劲,特别是在涉及情感、情绪或毒性的任务中。然而,对于隐式信号任务(如参与度测量和主观性检测),ChatGPT模型表现不足。