大型语言模型治疗师行为评估的计算框架 (A Computational Framework for Behavioral Assessment of LLM Therapists)

The emergence of large language models (LLMs) like ChatGPT has increased interest in their use as therapists to address mental health challenges and the widespread lack of access to care. However, experts have emphasized the critical need for systematic evaluation of LLM-based mental health interventions to accurately assess their capabilities and limitations. Here, we propose BOLT, a proof-of-concept computational framework to systematically assess the conversational behavior of LLM therapists. We quantitatively measure LLM behavior across 13 psychotherapeutic approaches with in-context learning methods. Then, we compare the behavior of LLMs against high- and low-quality human therapy. Our analysis based on Motivational Interviewing therapy reveals that LLMs often resemble behaviors more commonly exhibited in low-quality therapy rather than high-quality therapy, such as offering a higher degree of problem-solving advice when clients share emotions. However, unlike low-quality therapy, LLMs reflect significantly more upon clients' needs and strengths. Our findings caution that LLM therapists still require further research for consistent, high-quality care.

翻译：以ChatGPT为代表的大型语言模型（LLMs）的出现，引发了人们对其作为治疗师应对心理健康挑战、缓解普遍存在的医疗服务可及性不足问题的兴趣。然而，专家们强调，亟需对基于LLM的心理健康干预措施进行系统性评估，以准确判断其能力与局限。本文提出BOLT，一个概念验证性的计算框架，用于系统性评估LLM治疗师的对话行为。我们采用上下文学习方法，在13种心理治疗取向上对LLM行为进行了定量测量。随后，我们将LLM的行为与高质量及低质量的人类治疗进行了比较。基于动机性访谈治疗的分析表明，LLM的行为往往更类似于低质量治疗中常见的模式，而非高质量治疗。例如，当来访者表达情绪时，LLM会提供更高程度的问题解决建议。然而，与低质量治疗不同的是，LLM对来访者的需求和优势表现出显著更多的反思。我们的研究结果提示，LLM治疗师要提供稳定、高质量的治疗，仍需要进一步的研究。

相关内容

大语言模型

关注 65

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日