Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.
翻译:大型语言模型(LLMs)在心理健康支持中的应用日益广泛;然而,当前的安全基准测试往往难以检测治疗性对话中固有的复杂、纵向风险。我们提出一种评估框架,该框架将AI心理治疗师与配备动态认知情感模型的模拟患者智能体配对,并依据全面的护理质量与风险本体论对治疗会话模拟进行评估。我们将此框架应用于一个高影响力测试案例——酒精使用障碍,针对六款AI智能体(包括ChatGPT、Gemini和Character AI),评估其与一个包含15种代表不同临床表型的患者角色、经过临床验证的队列的交互表现。我们的大规模模拟(N=369个会话)揭示了AI用于心理健康支持时存在的关键安全缺陷。我们识别出特定的医源性风险,包括对患者妄想的确认(“AI精神病”)以及对自杀风险未能有效降级。最后,我们与包括AI工程师和红队测试人员、心理健康专业人员以及政策专家在内的多元利益相关者(N=9)共同验证了一个交互式数据可视化仪表板,证明该框架能有效帮助利益相关者审计AI心理治疗的“黑箱”。这些发现凸显了AI提供心理健康支持所面临的重大安全风险,以及在部署前进行基于模拟的临床红队测试的必要性。