Throughout history, a prevailing paradigm in mental healthcare has been one in which distressed people may receive treatment with little understanding around how their experience is perceived by their care provider, and in turn, the decisions made by their provider around how treatment will progress. Paralleling this offline model of care, people who seek mental health support from artificial intelligence (AI)-based chatbots are similarly provided little context for how their expressions of distress are processed by the model, and subsequently, any reasoning or theoretical grounding that may underlie model responses. People in severe distress who turn to AI chatbots for support thus find themselves caught between black boxes, contending with unique forms of agony that arise from these intersecting opacities. In this paper, we argue that the distinct psychological state of individuals experiencing severe mental distress uniquely necessitates a higher standard of end-user interpretability in comparison to general AI chatbot use. We propose a reflective interpretability approach to AI-mediated mental health support, which nudges users to engage in an agency-preserving and iterative process of reflection and interpretation of model outputs, towards creating meaning from interactions (rather than accepting outputs as directive instructions). Drawing on interpretability practices from four mental health fields (psychotherapy, crisis intervention, psychiatry, and care authorization), we describe concrete design approaches for reflective interpretability in AI-mediated mental health support, including role induction, prosocial advance directives, intervention titration, and well-defined mechanisms for recourse, alongside a discussion of potential risks and mitigation measures.
翻译:历史上,心理健康护理的主流范式往往是:受困扰者在接受治疗时,对其体验如何被护理提供者感知、以及提供者如何决定治疗进程知之甚少。与此线下护理模式相类似,向基于人工智能(AI)的聊天机器人寻求心理健康支持的人们,同样很少了解其痛苦表达如何被模型处理,以及模型回应背后可能存在的任何推理或理论基础。因此,处于严重困扰中并转向AI聊天机器人寻求支持的人们,发现自己陷入了双重黑箱的困境,承受着由这些交叉不透明性引发的独特形式的痛苦。本文认为,与一般AI聊天机器人使用相比,处于严重心理困扰的个体独特的心理状态,尤其需要对最终用户提出更高的可解释性标准。我们提出一种用于AI辅助心理健康支持的反思性可解释性方法,该方法引导用户参与一个保持能动性、迭代式的反思与模型输出解读过程,旨在从互动中创造意义(而非将输出视为指令性指导)。借鉴来自四个心理健康领域(心理治疗、危机干预、精神病学及护理授权)的可解释性实践,我们描述了AI辅助心理健康支持中实现反思性可解释性的具体设计方法,包括角色引导、亲社会预先指示、干预剂量调整以及明确的追索机制,并讨论了潜在风险与缓解措施。