This paper presents a set of reflections on saturation and the use of Large Language Models (LLMs) for performing Thematic Analysis (TA). The paper suggests that initial thematic saturation (ITS) could be used as a metric to assess part of the transactional validity of TA with LLM, focusing on the initial coding. The paper presents the initial coding of two datasets of different sizes, and it reflects on how the LLM reaches some form of analytical saturation during the coding. The procedure proposed in this work leads to the creation of two codebooks, one comprising the total cumulative initial codes and the other the total unique codes. The paper proposes a metric to synthetically measure ITS using a simple mathematical calculation employing the ratio between slopes of cumulative codes and unique codes. The paper contributes to the initial body of work exploring how to perform qualitative analysis with LLMs.
翻译:本文呈现了对饱和概念及使用大语言模型进行主题分析的一系列思考。论文提出,初始主题饱和可作为评估基于LLM的主题分析中部分交易有效性的指标,重点聚焦于初始编码阶段。研究对两个规模不同的数据集进行了初始编码,并反思了LLM在编码过程中如何达到某种形式的分析饱和。本文提出的程序生成了两个编码簿:一个包含累计初始编码总数,另一个包含唯一编码总数。论文提出了一种通过简单数学计算(采用累计编码斜率与唯一编码斜率之比)综合测量初始主题饱和的指标。本研究为探索如何运用LLM开展定性分析的初步工作体系作出了贡献。