Large Language Models (LLMs) are quickly becoming ubiquitous, but the implications for social science research are not yet well understood. This paper asks whether LLMs can help us analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees in Cox's Bazaar, Bangladesh. We find that a great deal of caution is needed in using LLMs to annotate text as there is a risk of introducing biases that can lead to misleading inferences. We here mean bias in the technical sense, that the errors that LLMs make in annotating interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less measurement error and bias than LLM annotations. Therefore, given that some high quality annotations are necessary in order to asses whether an LLM introduces bias, we argue that it is probably preferable to train a bespoke model on these annotations than it is to use an LLM for annotation.
翻译:大型语言模型(LLMs)正迅速普及,但其对社会科学研究的影响尚未得到充分理解。本文探讨了LLMs能否帮助分析开放式访谈中的大样本定性数据,并以孟加拉国科克斯巴扎尔的罗兴亚难民访谈记录为例进行应用。我们发现,使用LLMs标注文本时需要格外谨慎,因为存在引入偏差的风险,可能导致误导性推论。此处我们采用技术意义上的“偏差”,即LLMs在标注访谈记录时产生的错误与访谈对象的特征并非随机无关。相比LLM标注,使用灵活编码的高质量人工标注训练更简单的监督模型,能减少测量误差和偏差。因此,鉴于需要部分高质量人工标注才能评估LLM是否引入偏差,我们认为,在这些标注上训练定制模型可能比直接使用LLM进行标注更可取。