超越单轮交互：将拒绝重构为嵌入心理健康支持交互情境的动态体验 (Beyond the Single Turn: Reframing Refusals as Dynamic Experiences Embedded in the Context of Mental Health Support Interactions with LLMs)

Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors, but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiential processes embedded within the entire LLM design pipeline and the broader realities of mental health access.

翻译：内容警告：本文包含参与者引述及涉及心理健康挑战、情绪困扰与自杀意念的讨论。大型语言模型（LLM）在心理健康支持中的应用日益广泛，然而模型安全机制——特别是对敏感内容的拒绝回应——从用户和心理健康专业人员（MHP）的视角仍缺乏深入理解，且已有报告表明此类机制可能造成现实危害。本文通过一项序列混合方法研究，探讨了在心理健康支持交互中LLM拒绝行为如何被体验与解读。通过对使用LLM寻求心理健康支持的个人及心理健康专业人员开展的问卷调查（N=53）与深度访谈（N=16），我们发现拒绝并非孤立的单轮系统行为，而是构成动态的多阶段体验：拒绝前预期形成、拒绝触发与遭遇、拒绝信息框架构建、资源转介提供以及拒绝后结果。我们提出了一个超越二元策略合规准确性的多阶段评估框架，并为未来拒绝机制的设计提供建议。这些发现表明，理解LLM拒绝行为需要超越单轮交互的视角，将其视为嵌入整个LLM设计流程及心理健康服务现实情境的整体性体验过程。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

在从交互中学习时代面向大语言模型智能体的可扩展环境：综述

专知会员服务

22+阅读 · 2025年11月15日

大语言模型机器遗忘综述

专知会员服务

18+阅读 · 2025年11月2日

基于大语言模型的智能体易产生幻觉：分类体系、方法与未来方向综述

专知会员服务

30+阅读 · 2025年9月27日

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

专知会员服务

8+阅读 · 2025年8月12日