Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinical practice, social context, and first-hand user experience. This paper argues for a rethinking of responsible evaluation -- what is measured, by whom, and for what purpose -- by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity, providing a structured basis for evaluation. Through an analysis of 135 recent *CL publications, we identify recurring limitations, including over-reliance on generic metrics that do not capture clinical validity, therapeutic appropriateness, or user experience, limited participation from mental health professionals, and insufficient attention to safety and equity. To address these gaps, we propose a taxonomy of AI mental health support types -- assessment-, intervention-, and information synthesis-oriented -- each with distinct risks and evaluative requirements, and illustrate its use through case studies.
翻译:尽管人工智能在精神健康护理领域展现出日益广阔的前景,但当前针对该领域AI工具的评估方法仍存在碎片化、与临床实践、社会情境及一线用户体验脱节等问题。本文通过引入一个整合临床可靠性、社会情境与公平性的跨学科框架,为评估提供结构化基础,主张对负责任评估的核心理念——评估什么、由谁评估、为何评估——进行重新思考。通过对135篇近期*CL出版物的分析,我们识别出反复出现的局限性,包括过度依赖无法反映临床有效性、治疗适宜性或用户体验的通用指标,精神健康专业人员参与有限,以及对安全性与公平性关注不足。为弥补这些不足,我们提出了一种AI精神健康支持类型的分类体系——分为评估导向型、干预导向型与信息综合导向型——每种类型具有独特风险与评估要求,并通过案例研究展示其具体应用。