The Agony of Opacity: Foundations for Reflective Interpretability in AI-Mediated Mental Health Support

Throughout history, a prevailing paradigm in mental healthcare has been one in which distressed people may receive treatment with little understanding around how their experience is perceived by their care provider, and in turn, the decisions made by their provider around how treatment will progress. Paralleling this offline model of care, people who seek mental health support from artificial intelligence (AI)-based chatbots are similarly provided little context for how their expressions of distress are processed by the model, and subsequently, any reasoning or theoretical grounding that may underlie model responses. People in severe distress who turn to AI chatbots for support thus find themselves caught between black boxes, contending with unique forms of agony that arise from these intersecting opacities. In this paper, we argue that the distinct psychological state of individuals experiencing severe mental distress uniquely necessitates a higher standard of end-user interpretability in comparison to general AI chatbot use. We propose a reflective interpretability approach to AI-mediated mental health support, which nudges users to engage in an agency-preserving and iterative process of reflection and interpretation of model outputs, towards creating meaning from interactions (rather than accepting outputs as directive instructions). Drawing on interpretability practices from four mental health fields (psychotherapy, crisis intervention, psychiatry, and care authorization), we describe concrete design approaches for reflective interpretability in AI-mediated mental health support, including role induction, prosocial advance directives, intervention titration, and well-defined mechanisms for recourse, alongside a discussion of potential risks and mitigation measures.

翻译：历史上，心理健康护理的主流范式往往是：受困扰者在接受治疗时，对其体验如何被护理提供者感知、以及提供者如何决定治疗进程知之甚少。与此线下护理模式相类似，向基于人工智能（AI）的聊天机器人寻求心理健康支持的人们，同样很少了解其痛苦表达如何被模型处理，以及模型回应背后可能存在的任何推理或理论基础。因此，处于严重困扰中并转向AI聊天机器人寻求支持的人们，发现自己陷入了双重黑箱的困境，承受着由这些交叉不透明性引发的独特形式的痛苦。本文认为，与一般AI聊天机器人使用相比，处于严重心理困扰的个体独特的心理状态，尤其需要对最终用户提出更高的可解释性标准。我们提出一种用于AI辅助心理健康支持的反思性可解释性方法，该方法引导用户参与一个保持能动性、迭代式的反思与模型输出解读过程，旨在从互动中创造意义（而非将输出视为指令性指导）。借鉴来自四个心理健康领域（心理治疗、危机干预、精神病学及护理授权）的可解释性实践，我们描述了AI辅助心理健康支持中实现反思性可解释性的具体设计方法，包括角色引导、亲社会预先指示、干预剂量调整以及明确的追索机制，并讨论了潜在风险与缓解措施。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

具身智能中的心理世界建模：深度综述

专知会员服务

35+阅读 · 1月10日

可解释人工智能（XAI）：从内在可解释性到大语言模型

专知会员服务

34+阅读 · 2025年1月20日

《提高决策支持系统透明度的可解释人工智能》最新100页

专知会员服务

51+阅读 · 2024年11月28日

AI在医疗中的安全挑战

专知会员服务

19+阅读 · 2024年10月5日