Identifying scientific publications that are within a dynamic field of research often requires costly annotation by subject-matter experts. Resources like widely-accepted classification criteria or field taxonomies are unavailable for a domain like artificial intelligence (AI), which spans emerging topics and technologies. We address these challenges by inferring a functional definition of AI research from existing expert labels, and then evaluating state-of-the-art chatbot models on the task of expert data annotation. Using the arXiv publication database as ground-truth, we experiment with prompt engineering for GPT chatbot models to identify an alternative, automated expert annotation pipeline that assigns AI labels with 94% accuracy. For comparison, we fine-tune SPECTER, a transformer language model pre-trained on scientific publications, that achieves 96% accuracy (only 2% higher than GPT) on classifying AI publications. Our results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required. To evaluate the utility of chatbot-annotated datasets on downstream classification tasks, we train a new classifier on GPT-labeled data and compare its performance to the arXiv-trained model. The classifier trained on GPT-labeled data outperforms the arXiv-trained model by nine percentage points, achieving 82% accuracy.
翻译:识别动态研究领域内的科学出版物通常需要领域专家进行昂贵的标注。对于人工智能(AI)等涵盖新兴主题和技术的领域,既缺乏广泛接受的分类标准,也没有现成的领域分类体系。我们通过从现有专家标签中推断AI研究的实用定义,并评估最先进的聊天机器人模型在专家数据标注任务中的表现来应对这些挑战。以arXiv出版数据库为基准,我们通过提示工程实验GPT聊天机器人模型,构建了一种自动化专家标注替代方案,能以94%的准确率分配AI标签。作为对比,我们微调了预训练于科学出版物的Transformer语言模型SPECTER,其在AI出版物分类任务中达到96%准确率(仅比GPT高2%)。研究结果表明,通过有效的提示工程,即使在需要领域专业知识的任务中,聊天机器人也能作为可靠的数据标注工具。为评估聊天机器人标注数据集在下游分类任务中的效用,我们在GPT标注数据上训练新分类器,并将其性能与arXiv训练模型进行比较。基于GPT标注数据训练的分类器以82%的准确率超越arXiv训练模型九个百分点。