Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Analysis · MoDELS · 语言模型化 · Learning · 论文 ·

2023 年 5 月 24 日

翻译：大型语言模型能否模拟半结构化访谈的归纳式主题分析？——方法局限性与模型边界的探索与思辨

Stefano De Paoli

Large Language Models (LLMs) have emerged as powerful generative Artificial Intelligence solutions which can be applied to several fields and areas of work. This paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis. Previous research on this subject has largely worked on conducting deductive analysis. Thematic Analysis is a qualitative method for analysis commonly used in social sciences and it is based on interpretations made by the human analyst(s) and the identification of explicit and latent meanings in qualitative data. Attempting an analysis based on human interpretation with an LLM clearly is a provocation but also a way to learn something about how these systems can or cannot be used in qualitative research. The paper presents the motivations for attempting this emulation, it reflects on how the six steps to a Thematic Analysis proposed by Braun and Clarke can at least partially be reproduced with the LLM and it also reflects on what are the outputs produced by the model. The paper used two existing datasets of open access semi-structured interviews, previously analysed with Thematic Analysis by other researchers. It used the previously produced analysis (and the related themes) to compare with the results produced by the LLM. The results show that the model can infer at least partially some of the main Themes. The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.

翻译：大型语言模型（LLMs）作为强大的生成式人工智能解决方案，已可应用于多个领域和工作场景。本文报告了一项实验的结果与反思，该实验尝试使用GPT 3.5-Turbo模型模拟归纳式主题分析的某些环节。此前关于该主题的研究主要集中于演绎分析。主题分析是社会科学中常用的定性分析方法，其基础是人类分析者对质性数据中显性与隐性意义的解读。尝试用LLM进行基于人类解读的分析显然是一种思辨，但同时也是探究此类系统能否以及如何应用于定性研究的途径。本文阐述了进行这种模拟实验的动机，反思了Braun和Clarke提出的主题分析六步骤如何能（至少部分地）通过LLM复现，并探讨了模型生成的输出结果。研究使用了两个现有开放获取的半结构化访谈数据集，这些数据集此前已由其他研究者通过主题分析法完成分析。通过将先前生成的分析结果（及相关主题）与LLM输出进行对比，结果显示模型能够至少部分推断出某些主要主题。本文的目的并非以机器分析取代定性研究中的人类分析者，而是探究LLM数据处理的某些要素能否在特定程度上为定性研究提供支持。