Mental Health AI Safety Claims Must Preserve Temporal Evidence

The safety of mental health AI is often judged at the wrong temporal scale. Current evaluations typically score isolated responses, endpoint outcomes, or aggregate dialogue quality, while clinically consequential failures may arise from the order and accumulation of interactions themselves, including delayed escalation, repeated reinforcement, dependency formation, failed repair, and gradual deterioration across turns. This paper argues that this mismatch is not merely a limitation of evaluation coverage but a source of invalid safety conclusions. We introduce Temporal Safety Non-Identifiability, a formal account of why safety properties that depend on sequence, timing, accumulation, or recovery cannot be certified by protocols that discard those features. From this formalization, we develop SCOPE (Safety Claims Over Preserved Evidence) as a general principle for aligning safety claims with the evidence an evaluation actually retains, and instantiate it as SCOPE-MH, a mental-health instantiation of this reporting standard. We operationalize SCOPE-MH through a proof-of-concept on the AnnoMI dataset of expert-annotated motivational interviewing conversations, which reveals mechanisms of failure that per-turn behavior scoring does not represent. We propose SCOPE-MH as a diagnostic complement to existing evaluation infrastructure and argue that evaluation preserving temporal evidence is necessary, not optional, for safety-critical mental health AI deployment.

翻译：心理健康AI的安全性往往在错误的时间尺度上被评估。当前的评估通常对孤立回应、最终结果或整体对话质量进行评分，而临床上的关键失败可能源于交互本身的顺序和累积效应，包括延迟升级、重复强化、依赖形成、修复失败以及跨轮次逐步恶化。本文认为，这种不匹配不仅仅是评估覆盖范围的局限，更是导致无效安全性结论的根源。我们引入"时间安全性不可辨识性"（Temporal Safety Non-Identifiability）这一形式化概念，说明为何依赖于序列、时序、累积或恢复特性的安全性属性无法通过丢弃这些特征的协议加以认证。基于这一形式化框架，我们提出SCOPE（基于保留证据的安全性声明）作为通用原则，用于对齐安全性声明与评估实际保留的证据，并将其实例化为SCOPE-MH——这一报告标准的心理健康领域具体实现。我们通过在专家标注的动机性访谈对话数据集AnnoMI上进行概念验证来操作化SCOPE-MH，揭示了逐轮行为评分无法表征的失败机制。我们主张将SCOPE-MH作为现有评估基础设施的诊断性补充，并论证：对于安全性关键的心理健康AI部署而言，保留时间证据的评估是必要而非可选的。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

11+阅读 · 6月14日

《人工智能辅助决策中信任的时间演化》225页

专知会员服务

25+阅读 · 2025年5月12日

机密计算保障人工智能系统安全研究报告

专知会员服务

20+阅读 · 2025年1月20日

AI在医疗中的安全挑战

专知会员服务

19+阅读 · 2024年10月5日