Building on a human-led thematic analysis of life-story interviews with inpatients with Borderline Personality Disorder, this study examines the capacity of large language models (OpenAI's GPT, Google's Gemini, and Anthropic's Claude) to support qualitative clinical analysis. The models were evaluated through a mixed procedure. Study A involved blinded and non-blinded expert judges in phenomenology and clinical psychology. Assessments included semantic congruence, Jaccard coefficients for overlap of outputs, multidimensional validity ratings of credibility, coherence, and the substantiveness of results, and their grounding in qualitative data. In Study B, neural methods were used to embed the theme descriptions created by humans and the models in a two-dimensional vector space to provide a computational measure of the difference between human and model semantics and linguistic style. In Study C, complementary non-expert evaluations were conducted to examine the influence of thematic verbosity on the perception of human authorship and content validity. Results of all three studies revealed variable overlap with the human analysis, with models being partly indistinguishable from, and also identifying themes originally omitted by, human researchers. The findings highlight both the variability and potential of AI-augmented thematic qualitative analysis to mitigate human interpretative bias and enhance sensitivity.
翻译:本研究基于对边缘性人格障碍住院患者生活故事访谈的人工主题分析,探讨了大型语言模型(OpenAI的GPT、Google的Gemini和Anthropic的Claude)支持质性临床分析的能力。通过混合程序对模型进行评估:研究A邀请现象学与临床心理学领域的专家评审(采用盲审与非盲审设计),从语义一致性、输出结果的Jaccard重叠系数、结果的可信度/连贯性/实质性等多维度效度评分,及其与质性数据的锚定程度进行考察;研究B采用神经网络方法将人类与模型生成的主题描述嵌入二维向量空间,通过计算度量人类与模型在语义及语言风格上的差异;研究C通过补充性非专家评估,探究主题表述的冗长度对人类作者身份感知及内容效度判断的影响。三项研究结果显示:模型输出与人类分析存在不同程度的重叠,部分模型生成内容与人类研究者难以区分,同时能识别出人类研究者最初遗漏的主题。这些发现揭示了AI增强型主题质性分析在缓解人类解释偏见与提升分析敏感度方面的可变性与潜在价值。