Building trustworthy AI systems for mental health support is a shared priority across stakeholders from multiple disciplines. However, "trustworthy" remains loosely defined and inconsistently operationalized. AI research often focuses on technical criteria (e.g., robustness, explainability, and safety), while therapeutic practitioners emphasize therapeutic fidelity (e.g., appropriateness, empathy, and long-term user outcomes). To bridge the fragmented landscape, we propose a three-layer trust framework, covering human-oriented, AI-oriented, and interaction-oriented trust, integrating the viewpoints of key stakeholders (e.g., practitioners, researchers, regulators). Using this framework, we systematically review existing AI-driven research in mental health domain and examine evaluation practices for ``trustworthy'' ranging from automatic metrics to clinically validated approaches. We highlight critical gaps between what NLP currently measures and what real-world mental health contexts require, and outline a research agenda for building socio-technically aligned and genuinely trustworthy AI for mental health support.
翻译:构建值得信赖的心理健康支持人工智能系统,是跨学科利益相关方的共同优先事项。然而,"值得信赖"的定义仍较为模糊,且操作化方式不一致。人工智能研究通常侧重于技术标准(如鲁棒性、可解释性和安全性),而治疗从业者则强调治疗保真度(如适当性、共情和长期用户结果)。为弥合这一碎片化研究格局,我们提出一个三层信任框架,涵盖人类导向、人工智能导向和交互导向的信任,整合了关键利益相关方(如从业者、研究人员、监管机构)的观点。利用该框架,我们系统性地回顾了心理健康领域现有的人工智能驱动研究,并考察了从自动评估指标到临床验证方法等对"值得信赖"的评估实践。我们指出了当前自然语言处理测量内容与现实世界心理健康情境需求之间的关键差距,并概述了旨在构建社会技术对齐且真正值得信赖的心理健康支持人工智能的研究议程。