Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we demonstrate that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection. Specifically, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of five tasks, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003 -- about twenty times cheaper than MTurk. These results show the potential of large language models to drastically increase the efficiency of text classification.
翻译:许多自然语言处理(NLP)应用需要针对各类任务进行人工数据标注,尤其是用于训练分类器或评估无监督模型的性能。根据任务的规模和复杂程度,这些标注工作可能由MTurk等平台上的众包工人或经过训练的专业标注员(如研究助理)完成。基于2382条推文的样本,我们证明ChatGPT在相关性、立场、话题和框架检测等多个标注任务中均优于众包工人。具体而言,ChatGPT在五项任务中的四项零样本准确率超过众包工人,且其编码员间一致性在所有任务中均高于众包工人和经过训练的专业标注员。此外,ChatGPT每条标注成本低于0.003美元,约为MTurk标注成本的二十分之一。这些结果表明,大型语言模型具有显著提升文本分类效率的潜力。