Generative artificial intelligence (GenAI) or large language models (LLMs) have the potential to revolutionize computational social science, particularly in automated textual analysis. In this paper, we conduct a systematic evaluation of the promises and risks of using LLMs for diverse coding tasks, with social movement studies serving as a case example. We propose a framework for social scientists to incorporate LLMs into text annotation, either as the primary coding decision-maker or as a coding assistant. This framework provides tools for researchers to develop the optimal prompt, and to examine and report the validity and reliability of LLMs as a methodological tool. Additionally, we discuss the associated epistemic risks related to validity, reliability, replicability, and transparency. We conclude with several practical guidelines for using LLMs in text annotation tasks, and how we can better communicate the epistemic risks in research.
翻译:生成式人工智能(GenAI)或大型语言模型(LLMs)有潜力彻底改变计算社会科学,尤其是在自动化文本分析领域。本文以社会运动研究为例,对使用LLMs执行多样化编码任务的潜力与风险进行了系统性评估。我们提出了一个框架,帮助社会科学家将LLMs纳入文本标注流程,既可以作为主要的编码决策者,也可以作为编码助手。该框架为研究人员提供了工具,以开发最优提示词,并检验和报告LLMs作为方法论工具的有效性与可靠性。此外,我们还讨论了与有效性、可靠性、可复现性和透明度相关的认识论风险。最后,我们提出了在文本标注任务中使用LLMs的若干实用指南,以及如何更好地在研究工作中传达这些认识论风险。