Large Language Models (LLMs) are quickly becoming ubiquitous, but the implications for social science research are not yet well understood. This paper asks whether LLMs can help us analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees in Cox's Bazaar, Bangladesh. We find that a great deal of caution is needed in using LLMs to annotate text as there is a risk of introducing biases that can lead to misleading inferences. We here mean bias in the technical sense, that the errors that LLMs make in annotating interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less measurement error and bias than LLM annotations. Therefore, given that some high quality annotations are necessary in order to asses whether an LLM introduces bias, we argue that it is probably preferable to train a bespoke model on these annotations than it is to use an LLM for annotation.
翻译:大型语言模型(LLMs)正迅速变得无处不在,但其对社会科学研究的影响尚不充分明确。本文探讨LLMs能否帮助我们分析来自开放式访谈的大规模定性数据,并以孟加拉国科克斯巴扎尔罗兴亚难民访谈记录为例进行应用。我们发现,在使用LLMs标注文本时需要极为谨慎,因为这存在引入偏差的风险,可能导致误导性推论。此处的偏差特指技术意义上的偏差,即LLMs在标注访谈记录时产生的错误并非随机分布,而是与访谈对象的特征相关。相比LLM标注,依靠高质量人工标注并采用灵活编码来训练更简单的监督模型,能够减少测量误差和偏差。因此,鉴于需要一定量的人工标注才能评估LLM是否引入偏差,我们认为,基于这些人工标注训练定制模型可能比直接使用LLM进行标注更为可取。