Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model

Analysis · MoDELS · 语言模型化 · Learning · 论文 ·

2023 年 5 月 22 日

翻译：大型语言模型能否模仿半结构化访谈的归纳式主题分析？——对方法及模型局限性的探索与挑战

Stefano De Paoli

Large Language Models (LLMs) have emerged as powerful generative Artificial Intelligence solutions which can be applied to several fields and areas of work. This paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis. Previous research on this subject has largely worked on conducting deductive analysis. Thematic Analysis is a qualitative method for analysis commonly used in social sciences and it is based on interpretations made by the human analyst(s) and the identification of explicit and latent meanings in qualitative data. Attempting an analysis based on human interpretation with an LLM clearly is a provocation but also a way to learn something about how these systems can or cannot be used in qualitative research. The paper presents the motivations for attempting this emulation, it reflects on how the six steps to a Thematic Analysis proposed by Braun and Clarke can at least partially be reproduced with the LLM and it also reflects on what are the outputs produced by the model. The paper used two existing datasets of open access semi-structured interviews, previously analysed with Thematic Analysis by other researchers. It used the previously produced analysis (and the related themes) to compare with the results produced by the LLM. The results show that the model can infer at least partially some of the main Themes. The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.

翻译：大型语言模型（LLMs）已成为强大的生成式人工智能解决方案，可应用于多个领域和工作场景。本文报告了一项实验的结果与反思，该实验尝试使用GPT 3.5-Turbo模型来模仿归纳式主题分析的某些方面。此前关于这一主题的研究主要集中于演绎分析。主题分析是社会科学中常用的一种定性分析方法，其基础是人类分析者的解读，以及对定性数据中显性和隐性意义的识别。试图基于人类解读的LLM分析显然是一种挑战，但也是了解这些系统能否以及如何用于定性研究的途径。本文阐述了进行这项模仿实验的动机，反思了Braun和Clarke提出的主题分析六步骤如何在LLM中至少得以部分复现，并探讨了模型生成的输出结果。本研究使用了两个现有的公开半结构化访谈数据集，这些数据此前已被其他研究者通过主题分析方法分析过。我们将之前产生的分析结果及相关主题与LLM生成的输出进行了比较。结果表明，该模型至少能够部分推断出某些主要主题。本文的目的并非用模型取代定性研究中的人类分析者，而是探讨LLM数据处理中的某些要素是否能在一定程度上为定性研究提供支持。