The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.
翻译:本研究探讨了基于大量医学知识语料显式训练的大型语言模型(Med-PaLM 2)在无需针对性训练的情况下,通过患者访谈和临床描述预测精神功能的能力。为评估这一能力,研究分析了来自高患病率/高共病障碍(抑郁、焦虑、精神病性、创伤及应激相关、成瘾障碍)的145份抑郁评估、115份创伤后应激障碍评估及46份临床案例研究,利用提示词提取临床评分和诊断估计值。结果表明,Med-PaLM 2能够评估多种精神疾病的精神功能,其中基于标准化评估预测抑郁评分表现最佳(准确率范围=0.80-0.84),与人类临床评分者的差异无统计学意义(t(1,144)=1.20;p=0.23)。研究展示了通用临床语言模型基于患者和临床医生提供的功能自由描述,灵活预测精神风险的潜力。