Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.

翻译：大型语言模型（LLMs）在心理健康支持中的应用日益广泛；然而，当前的安全基准测试往往难以检测治疗性对话中固有的复杂、纵向风险。我们提出一种评估框架，该框架将AI心理治疗师与配备动态认知情感模型的模拟患者智能体配对，并依据全面的护理质量与风险本体论对治疗会话模拟进行评估。我们将此框架应用于一个高影响力测试案例——酒精使用障碍，针对六款AI智能体（包括ChatGPT、Gemini和Character AI），评估其与一个包含15种代表不同临床表型的患者角色、经过临床验证的队列的交互表现。我们的大规模模拟（N=369个会话）揭示了AI用于心理健康支持时存在的关键安全缺陷。我们识别出特定的医源性风险，包括对患者妄想的确认（“AI精神病”）以及对自杀风险未能有效降级。最后，我们与包括AI工程师和红队测试人员、心理健康专业人员以及政策专家在内的多元利益相关者（N=9）共同验证了一个交互式数据可视化仪表板，证明该框架能有效帮助利益相关者审计AI心理治疗的“黑箱”。这些发现凸显了AI提供心理健康支持所面临的重大安全风险，以及在部署前进行基于模拟的临床红队测试的必要性。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

基于大语言模型的医疗推理研究：综述与 MR-Bench 基准测试

专知会员服务

15+阅读 · 4月13日

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

27+阅读 · 2月27日

评估大语言模型在科学发现中的作用

专知会员服务

19+阅读 · 2025年12月19日

北大团队发布首篇大语言模型心理测量学系统综述：评估、验证、增强

专知会员服务

10+阅读 · 2025年5月27日