Millions of people now use non-clinical Large Language Model (LLM) tools like ChatGPT for mental well-being support. This paper investigates what it means to design such tools responsibly, and how to operationalize that responsibility in their design and evaluation. By interviewing experts and analyzing related regulations, we found that designing an LLM tool responsibly involves: (1) Articulating the specific benefits it guarantees and for whom. Does it guarantee specific, proven relief, like an over-the-counter drug, or offer minimal guarantees, like a nutritional supplement? (2) Specifying the LLM tool's "active ingredients" for improving well-being and whether it guarantees their effective delivery (like a primary care provider) or not (like a yoga instructor). These specifications outline an LLM tool's pertinent risks, appropriate evaluation metrics, and the respective responsibilities of LLM developers, tool designers, and users. These analogies - LLM tools as supplements, drugs, yoga instructors, and primary care providers - can scaffold further conversations about their responsible design.
翻译:如今,数百万人使用诸如ChatGPT等非临床大型语言模型(LLM)工具来获取心理健康支持。本文探讨了负责任地设计此类工具意味着什么,以及如何在设计和评估中落实这种责任。通过访谈专家和分析相关法规,我们发现负责任地设计LLM工具涉及:(1)阐明其保证的具体益处及受益对象。它是像非处方药一样保证提供具体、经过验证的缓解,还是像营养补充剂一样仅提供最低限度的保证?(2)明确LLM工具改善心理健康的“活性成分”,以及它是否保证这些成分的有效传递(如同初级保健提供者)或不作保证(如同瑜伽教练)。这些规范勾勒出LLM工具的相关风险、适当的评估指标,以及LLM开发者、工具设计者和用户各自的责任。这些类比——将LLM工具视为补充剂、药物、瑜伽教练和初级保健提供者——能够为进一步探讨其负责任设计提供框架。