Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that the performance of today's LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in} social science analysis in partnership with humans.
翻译:大型语言模型(LLMs)能够在零样本(无需训练数据)条件下成功完成多项语言处理任务。若零样本LLMs还能可靠地分类和解释说服力、政治意识形态等社会现象,则其有望以重要方式增强计算社会科学(CSS)研究流程。本文为将LLMs作为CSS工具提供了路线图:我们贡献了一套提示工程最佳实践,并构建了涵盖13个语言模型在25项代表性英文CSS基准测试中零样本性能的评估体系。在分类标签任务中,LLMs虽未能超越最优精调模型,但仍达到与人类相当的标注一致性;在自由编码生成任务中,LLMs生成的解释质量常超越众包工作者的人工标注基准。我们得出结论:当前LLMs可通过两种方式增强CSS研究流程:(1)作为人类标注团队的零样本数据注释器,(2)启动挑战性创意生成任务(如解释文本潜在属性)。总之,LLMs已具备在人类协作下有意义地参与社会科学分析的潜力。