This paper reflects on the process of performing Thematic Analysis with Large Language Models (LLMs). Specifically, the paper deals with the problem of analytical saturation of initial codes, as produced by LLMs. Thematic Analysis is a well-established qualitative analysis method composed of interlinked phases. A key phase is the initial coding, where the analysts assign labels to discrete components of a dataset. Saturation is a way to measure the validity of a qualitative analysis and relates to the recurrence and repetition of initial codes. In the paper we reflect on how well LLMs achieve analytical saturation and propose also a novel technique to measure Inductive Thematic Saturation (ITS). This novel technique leverages a programming framework called DSPy. The proposed novel approach allows a precise measurement of ITS.
翻译:本文反思了使用大语言模型进行主题分析的过程,特别探讨了由大语言模型生成的初始编码的分析饱和问题。主题分析是一种成熟的质性分析方法,由相互关联的阶段组成。关键阶段是初始编码,即分析人员为数据集的离散组成部分分配标签。饱和是衡量质性分析有效性的方式,与初始编码的复现和重复性相关。本文反思了大语言模型实现分析饱和的程度,并提出了一种测量归纳主题饱和的新技术。该新技术利用名为DSPy的编程框架,所提出的新方法能够实现对归纳主题饱和的精确测量。