ChatGPT has shown the potential of emerging general artificial intelligence capabilities, as it has demonstrated competent performance across many natural language processing tasks. In this work, we evaluate the capabilities of ChatGPT to perform text classification on three affective computing problems, namely, big-five personality prediction, sentiment analysis, and suicide tendency detection. We utilise three baselines, a robust language model (RoBERTa-base), a legacy word model with pretrained embeddings (Word2Vec), and a simple bag-of-words baseline (BoW). Results show that the RoBERTa trained for a specific downstream task generally has a superior performance. On the other hand, ChatGPT provides decent results, and is relatively comparable to the Word2Vec and BoW baselines. ChatGPT further shows robustness against noisy data, where Word2Vec models achieve worse results due to noise. Results indicate that ChatGPT is a good generalist model that is capable of achieving good results across various problems without any specialised training, however, it is not as good as a specialised model for a downstream task.
翻译:ChatGPT展现了新兴通用人工智能能力的潜力,因为它已在许多自然语言处理任务中表现出色。在这项工作中,我们评估了ChatGPT在三个情感计算问题(即大五人格预测、情感分析和自杀倾向检测)上进行文本分类的能力。我们使用了三个基线模型:一个强大的语言模型(RoBERTa-base)、一个带有预训练嵌入的经典词模型(Word2Vec)和一个简单的词袋基线模型(BoW)。结果表明,针对特定下游任务训练的RoBERTa通常性能更优。另一方面,ChatGPT提供了尚可的结果,并且相对而言与Word2Vec和BoW基线相当。ChatGPT还显示出对噪声数据的鲁棒性,而Word2Vec模型因噪声导致结果更差。结果表明,ChatGPT是一个良好的通用型模型,无需专门训练即可在各种问题上取得良好结果,但在下游任务上不如专用模型。