Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n=785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in terms of the themes. Finally, the themes autonomously discovered by the LLM appear to map fairly well to the themes arrived at by legal experts. These findings can be leveraged by legal researchers to guide their decisions in integrating LLMs into their thematic analyses, as well as other inductive coding projects.
翻译:主题分析及归纳编码的其他变体是实证法律研究(ELS)中广泛使用的定性分析方法。我们提出了一种新颖框架,旨在促进法律专家与大型语言模型(LLM)的有效协作,以生成初始编码(主题分析的第二阶段)、搜索主题(第三阶段)以及根据主题对数据进行分类(启动第四阶段)。我们利用该框架对盗窃类刑事法庭意见中的事实描述数据集(n=785)进行了分析。分析目标是发现典型盗窃行为的类别。结果表明,大型语言模型(即OpenAI的GPT-4)生成了合理的初始编码,并能够基于专家反馈提升编码质量。结果还显示,该模型在基于主题对事实描述进行零样本分类方面表现良好。最后,由LLM自主发现的主题与法律专家得出的主题具有较高的一致性。这些发现可被法律研究人员用于指导其将LLM整合到主题分析及其他归纳编码项目中的决策。