Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Analysis · 大语言模型 · MoDELS · 语言模型化 · Learning ·

2023 年 12 月 11 日

翻译：大型语言模型能否模拟半结构化访谈的归纳性主题分析？——关于方法与模型局限性的探索与反思

Stefano De Paoli

Large Language Models (LLMs) have emerged as powerful generative Artificial Intelligence solutions which can be applied to several fields and areas of work. This paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis. Previous research on this subject has largely worked on conducting deductive analysis. Thematic Analysis is a qualitative method for analysis commonly used in social sciences and it is based on interpretations made by the human analyst(s) and the identification of explicit and latent meanings in qualitative data. Attempting an analysis based on human interpretation with an LLM clearly is a provocation but also a way to learn something about how these systems can or cannot be used in qualitative research. The paper presents the motivations for attempting this emulation, it reflects on how the six steps to a Thematic Analysis proposed by Braun and Clarke can at least partially be reproduced with the LLM and it also reflects on what are the outputs produced by the model. The paper used two existing datasets of open access semi-structured interviews, previously analysed with Thematic Analysis by other researchers. It used the previously produced analysis (and the related themes) to compare with the results produced by the LLM. The results show that the model can infer at least partially some of the main Themes. The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.

翻译：大型语言模型（LLMs）作为强大的生成式人工智能解决方案，已可应用于多个领域和工作场景。本文呈现了一项使用GPT 3.5-Turbo模型模拟归纳性主题分析部分环节的实验结果与反思。此前相关研究主要聚焦于演绎性分析。主题分析是社会科学中常用的定性分析方法，其核心依赖于人类分析者对质性数据中显性与隐性含义的识别与诠释。试图用LLM进行基于人类诠释的分析显然是一种挑战，但亦是探索此类系统能否及如何应用于定性研究的途径。本文阐释了进行该模拟的动机，反思了Braun与Clarke提出的主题分析六步骤在何种程度上可通过LLM复现，并探讨了模型产出的特征。研究使用了两个现有公开的半结构化访谈数据集（此前已被其他研究者通过主题分析法分析），将既有分析结果（及相关主题）与LLM产出进行对比。结果表明，模型至少能部分推断出若干核心主题。本文旨在并非取代定性分析中的人类分析者，而是探究LLM的数据处理能力能否在一定程度上为定性研究提供支持。