Use of large language models such as ChatGPT (GPT-4) for mental health support has grown rapidly, emerging as a promising route to assess and help people with mood disorders, like depression. However, we have a limited understanding of GPT-4's schema of mental disorders, that is, how it internally associates and interprets symptoms. In this work, we leveraged contemporary measurement theory to decode how GPT-4 interrelates depressive symptoms to inform both clinical utility and theoretical understanding. We found GPT-4's assessment of depression: (a) had high overall convergent validity (r = .71 with self-report on 955 samples, and r = .81 with experts judgments on 209 samples); (b) had moderately high internal consistency (symptom inter-correlates r = .23 to .78 ) that largely aligned with literature and self-report; except that GPT-4 (c) underemphasized suicidality's -- and overemphasized psychomotor's -- relationship with other symptoms, and (d) had symptom inference patterns that suggest nuanced hypotheses (e.g. sleep and fatigue are influenced by most other symptoms while feelings of worthlessness/guilt is mostly influenced by depressed mood).
翻译:以ChatGPT(GPT-4)为代表的大型语言模型在心理健康支持领域的应用迅速增长,为评估和帮助抑郁症等情绪障碍患者提供了前景广阔的途径。然而,我们对于GPT-4关于精神障碍的认知图式——即其内部如何关联和解释症状——的理解仍十分有限。本研究运用当代测量理论解码GPT-4对抑郁症状的关联模式,以期为临床实践和理论认知提供参考。研究发现GPT-4的抑郁评估具有以下特征:(a)整体聚合效度较高(在955个样本中与自我报告的相关性r = .71,在209个样本中与专家判断的相关性r = .81);(b)内部一致性处于中等偏高水平(症状间相关性r = .23至.78),与文献记载及自我报告数据基本吻合;但GPT-4存在以下特点:(c)相对弱化了自杀倾向与其他症状的关联,同时过度强调了精神运动症状的关联性;(d)其症状推断模式呈现出精细化的假设特征(例如睡眠与疲劳受多数其他症状影响,而无价值感/罪恶感主要受抑郁情绪影响)。