This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.
翻译:本研究评估了开源大语言模型在文本标注任务中的表现,并将其与ChatGPT等专有模型及MTurk等人力服务进行对比。尽管先前研究已证实ChatGPT在众多自然语言处理任务中表现优异,但HugginChat、FLAN等开源大语言模型因其成本效益、透明度、可复现性及更优的数据保护能力而备受关注。我们采用零样本与少样本两种方法,结合不同温度参数,针对多项文本标注任务对上述模型进行了评估。研究结果表明:虽然ChatGPT在多数任务中表现最佳,但开源大语言模型不仅全面超越MTurk,在特定任务中更展现出与ChatGPT竞争的实力。