Millions of people now use non-clinical Large Language Model (LLM) tools like ChatGPT for mental well-being support. This paper investigates what it means to design such tools responsibly, and how to operationalize that responsibility in their design and evaluation. By interviewing experts and analyzing related regulations, we found that designing an LLM tool responsibly involves: (1) Articulating the specific benefits it guarantees and for whom. Does it guarantee specific, proven relief, like an over-the-counter drug, or offer minimal guarantees, like a nutritional supplement? (2) Specifying the LLM tool's "active ingredients" for improving well-being and whether it guarantees their effective delivery (like a primary care provider) or not (like a yoga instructor). These specifications outline an LLM tool's pertinent risks, appropriate evaluation metrics, and the respective responsibilities of LLM developers, tool designers, and users. These analogies - LLM tools as supplements, drugs, yoga instructors, and primary care providers - can scaffold further conversations about their responsible design.
翻译:目前有数百万人使用 ChatGPT 等非临床大型语言模型(LLM)工具来获取心理健康支持。本文探讨了负责任地设计此类工具的含义,以及如何在设计和评估中落实这种责任。通过访谈专家和分析相关法规,我们发现负责任地设计 LLM 工具包括:(1)阐明其保证的具体益处及适用对象。它是否像非处方药一样保证提供具体、经过验证的缓解,还是像营养补充剂一样仅提供最低限度的保证?(2)明确 LLM 工具改善心理健康的“活性成分”,以及它是否像初级保健提供者一样保证其有效传递,还是像瑜伽教练一样不作此类保证。这些规范明确了 LLM 工具的相关风险、适当的评估指标,以及 LLM 开发者、工具设计者和用户各自的责任。这些类比——将 LLM 工具视为补充剂、药物、瑜伽教练和初级保健提供者——可以为关于其负责任设计的进一步讨论提供框架。