Responsible Evaluation of AI for Mental Health

Hiba Arnaout,Anmol Goel,H. Andrew Schwartz,Steffen T. Eberhardt,Dana Atzil-Slonim,Gavin Doherty,Brian Schwartz,Wolfgang Lutz,Tim Althoff,Munmun De Choudhury,Hamidreza Jamalabadi,Raj Sanjay Shah,Flor Miriam Plaza-del-Arco,Dirk Hovy,Maria Liakata,Iryna Gurevych

Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinical practice, social context, and first-hand user experience. This paper argues for a rethinking of responsible evaluation -- what is measured, by whom, and for what purpose -- by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity, providing a structured basis for evaluation. Through an analysis of 135 recent *CL publications, we identify recurring limitations, including over-reliance on generic metrics that do not capture clinical validity, therapeutic appropriateness, or user experience, limited participation from mental health professionals, and insufficient attention to safety and equity. To address these gaps, we propose a taxonomy of AI mental health support types -- assessment-, intervention-, and information synthesis-oriented -- each with distinct risks and evaluative requirements, and illustrate its use through case studies.

翻译：尽管人工智能在精神健康护理领域展现出日益广阔的前景，但当前针对该领域AI工具的评估方法仍存在碎片化、与临床实践、社会情境及一线用户体验脱节等问题。本文通过引入一个整合临床可靠性、社会情境与公平性的跨学科框架，为评估提供结构化基础，主张对负责任评估的核心理念——评估什么、由谁评估、为何评估——进行重新思考。通过对135篇近期*CL出版物的分析，我们识别出反复出现的局限性，包括过度依赖无法反映临床有效性、治疗适宜性或用户体验的通用指标，精神健康专业人员参与有限，以及对安全性与公平性关注不足。为弥补这些不足，我们提出了一种AI精神健康支持类型的分类体系——分为评估导向型、干预导向型与信息综合导向型——每种类型具有独特风险与评估要求，并通过案例研究展示其具体应用。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

专知会员服务

18+阅读 · 2025年7月15日

【博士论文】迈向负责任的人工智能：自主系统在安全性、公平性与可问责性方面的最新进展

专知会员服务

20+阅读 · 2025年6月15日

国家标准《人工智能风险管理能力评估》（征求意见稿）

专知会员服务

28+阅读 · 2024年11月2日