Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Analysis · MoDELS · 语言模型化 · Learning · 论文 ·

2023 年 9 月 18 日

翻译：大型语言模型能否模拟半结构化访谈的归纳式主题分析？——关于方法与模型局限性的探索与挑战

Stefano De Paoli

Large Language Models (LLMs) have emerged as powerful generative Artificial Intelligence solutions which can be applied to several fields and areas of work. This paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis. Previous research on this subject has largely worked on conducting deductive analysis. Thematic Analysis is a qualitative method for analysis commonly used in social sciences and it is based on interpretations made by the human analyst(s) and the identification of explicit and latent meanings in qualitative data. Attempting an analysis based on human interpretation with an LLM clearly is a provocation but also a way to learn something about how these systems can or cannot be used in qualitative research. The paper presents the motivations for attempting this emulation, it reflects on how the six steps to a Thematic Analysis proposed by Braun and Clarke can at least partially be reproduced with the LLM and it also reflects on what are the outputs produced by the model. The paper used two existing datasets of open access semi-structured interviews, previously analysed with Thematic Analysis by other researchers. It used the previously produced analysis (and the related themes) to compare with the results produced by the LLM. The results show that the model can infer at least partially some of the main Themes. The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.

翻译：大型语言模型（LLMs）已成为强大的生成式人工智能解决方案，可应用于多个领域和工作场景。本文展示了一项实验的结果与反思，该实验利用GPT 3.5-Turbo模型模拟归纳式主题分析的若干环节。先前相关研究主要聚焦于演绎式分析。主题分析是社会学科常用的一种质性分析方法，基于人类分析者的解读以及对质性数据中显性和隐性意义的识别。尝试用大型语言模型进行基于人类解读的分析，既是一种挑战，也是了解此类系统能否以及如何应用于质性研究的途径。本文阐述了开展此模拟实验的动因，反思了Braun与Clarke提出的主题分析六步骤如何至少能在LLM中部分复现，并探讨了模型输出的特征。研究采用了两组已由其他研究者通过主题分析法完成分析的开放式半结构化访谈公开数据集，将先前生成的分析结果（及相关主题）与LLM的输出进行对比。结果显示，模型能够至少部分推断出若干核心主题。本文的目标并非以模型替代人类分析者进行质性分析，而是探究LLM的数据处理能力是否能在一定程度上为质性研究提供支持。