Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors, but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiential processes embedded within the entire LLM design pipeline and the broader realities of mental health access.
翻译:内容警告:本文包含参与者引述及涉及心理健康挑战、情绪困扰与自杀意念的讨论。大型语言模型(LLM)在心理健康支持中的应用日益广泛,然而模型安全机制——特别是对敏感内容的拒绝回应——从用户和心理健康专业人员(MHP)的视角仍缺乏深入理解,且已有报告表明此类机制可能造成现实危害。本文通过一项序列混合方法研究,探讨了在心理健康支持交互中LLM拒绝行为如何被体验与解读。通过对使用LLM寻求心理健康支持的个人及心理健康专业人员开展的问卷调查(N=53)与深度访谈(N=16),我们发现拒绝并非孤立的单轮系统行为,而是构成动态的多阶段体验:拒绝前预期形成、拒绝触发与遭遇、拒绝信息框架构建、资源转介提供以及拒绝后结果。我们提出了一个超越二元策略合规准确性的多阶段评估框架,并为未来拒绝机制的设计提供建议。这些发现表明,理解LLM拒绝行为需要超越单轮交互的视角,将其视为嵌入整个LLM设计流程及心理健康服务现实情境的整体性体验过程。