Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiences embedded within users' support-seeking trajectories and the broader LLM design pipeline.
翻译:内容警告:本文包含参与者引述及与心理健康挑战、情绪困扰、自杀意念相关的讨论。大型语言模型(LLMs)正越来越多地被用于心理健康支持,然而从用户和心理健康专业人员的视角来看,模型的安全防护措施——特别是对敏感内容的拒绝机制——仍缺乏充分理解,且已被报道造成现实危害。本文呈现了一项采用顺序混合方法的研究结果,探究LLMs在心理健康支持互动中如何被体验与解读。通过对使用LLMs进行心理健康支持的个体及心理健康专业人员进行的问卷调查(N=53)和深度访谈(N=16),我们揭示出拒绝并非孤立的单轮系统行为,而是构成动态的多阶段体验:拒绝前预期形成、拒绝触发与遭遇、拒绝信息框架构建、资源转介提供,以及拒绝后结果。我们提出了一个超越二元策略合规准确性的多阶段评估框架,并为未来拒绝机制提供了设计建议。这些发现表明,理解LLMs的拒绝行为需突破单轮交互视角,将其视为嵌入用户寻求支持路径及更广泛LLM设计流水线中的整体性体验。