The release of ChatGPT has uncovered a range of possibilities whereby large language models (LLMs) can substitute human intelligence. In this paper, we seek to understand whether ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks. Such an achievement could significantly reduce the cost and complexity of social computing research. As such, we use ChatGPT to relabel five seminal datasets covering stance detection (2x), sentiment analysis, hate speech, and bot detection. Our results highlight that ChatGPT does have the potential to handle these data annotation tasks, although a number of challenges remain. ChatGPT obtains an average accuracy 0.609. Performance is highest for the sentiment analysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, we show that performance varies substantially across individual labels. We believe this work can open up new lines of analysis and act as a basis for future research into the exploitation of ChatGPT for human annotation tasks.
翻译:ChatGPT的发布揭示了大型语言模型(LLMs)替代人类智能的多种可能性。本文旨在探究ChatGPT是否具备在社会计算任务中复现人类生成的标签标注的潜力。此类成果有望显著降低社会计算研究的成本与复杂性。为此,我们使用ChatGPT对涵盖立场检测(2项)、情感分析、仇恨言论与机器人检测的五个经典数据集进行重新标注。结果表明,ChatGPT确实具有处理这些数据标注任务的潜力,但仍面临若干挑战。ChatGPT的平均准确率达到0.609,其中情感分析数据集的性能最优,正确标注了64.9%的推文。然而,我们发现不同标注标签的性能差异显著。我们认为,本研究可为探索ChatGPT在人类标注任务中的应用开辟新的分析方向,并为后续研究提供基础。