Many natural language processing (NLP) tasks rely on labeled data to train machine learning models to achieve high performance. However, data annotation can be a time-consuming and expensive process, especially when the task involves a large amount of data or requires specialized domains. Recently, GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. In this paper, we first claim that large language models (LLMs), such as GPT-3.5, can serve as an excellent crowdsourced annotator by providing them with sufficient guidance and demonstrated examples. To make LLMs to be better annotators, we propose a two-step approach, 'explain-then-annotate'. To be more precise, we begin by creating prompts for every demonstrated example, which we subsequently utilize to prompt a LLM to provide an explanation for why the specific ground truth answer/label was chosen for that particular example. Following this, we construct the few-shot chain-of-thought prompt with the self-generated explanation and employ it to annotate the unlabeled data. We conduct experiments on three tasks, including user input and keyword relevance assessment, BoolQ and WiC. The annotation results from GPT-3.5 surpasses those from crowdsourced annotation for user input and keyword relevance assessment. Additionally, for the other two tasks, GPT-3.5 achieves results that are comparable to those obtained through crowdsourced annotation.
翻译:许多自然语言处理(NLP)任务依赖标注数据来训练机器学习模型以实现高性能。然而,数据标注可能是一个耗时且昂贵的过程,尤其当任务涉及大量数据或需要专业领域知识时。近期,GPT-3.5系列模型在各种NLP任务中展现出卓越的少样本和零样本能力。本文首先提出,通过提供充分的指导和示例,大语言模型(如GPT-3.5)可以成为优秀的众包标注者。为使大语言模型成为更好的标注者,我们提出一种名为"先解释后标注"的两步方法。具体而言,我们首先为每个示例创建提示,随后利用这些提示引导大语言模型解释为何为该特定示例选择了特定的真实答案/标签。接着,我们利用自生成的解释构建少样本思维链提示,并将其用于标注未标注数据。我们在三项任务(包括用户输入与关键词相关性评估、BoolQ和WiC)上进行了实验。GPT-3.5在用户输入与关键词相关性评估中的标注结果优于众包标注。此外,对于另外两项任务,GPT-3.5取得了与通过众包标注相媲美的结果。