In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years, researches on the recognition and intervention of emotional disorders using large language models (LLMs) have been validated, providing new possibilities for psychological assistance therapy. However, are LLMs truly possible to conduct cognitive behavioral therapy? Many concerns have been raised by mental health experts regarding the use of LLMs for therapy. Seeking to answer this question, we collected real CBT corpus from online video websites, designed and conducted a targeted automatic evaluation framework involving the evaluation of emotion tendency of generated text, structured dialogue pattern and proactive inquiry ability. For emotion tendency, we calculate the emotion tendency score of the CBT dialogue text generated by each model. For structured dialogue pattern, we use a diverse range of automatic evaluation metrics to compare speaking style, the ability to maintain consistency of topic and the use of technology in CBT between different models . As for inquiring to guide the patient, we utilize PQA (Proactive Questioning Ability) metric. We also evaluated the CBT ability of the LLM after integrating a CBT knowledge base to explore the help of introducing additional knowledge to enhance the model's CBT counseling ability. Four LLM variants with excellent performance on natural language processing are evaluated, and the experimental result shows the great potential of LLMs in psychological counseling realm, especially after combining with other technological means.
翻译:在当代社会,心理健康问题日益凸显,表现为精神障碍的多样化、复杂化和普遍化。认知行为疗法(CBT)作为目前最具影响力、临床效果显著且无副作用的心理治疗方法,在大多数国家覆盖范围有限且质量参差不齐。近年来,利用大型语言模型(LLMs)进行情绪障碍识别与干预的研究已得到验证,为心理辅助治疗提供了新的可能性。然而,LLMs是否真正能够实施认知行为疗法?心理健康专家对使用LLMs进行治疗提出了诸多担忧。为探究此问题,我们从在线视频网站收集了真实的CBT语料,设计并实施了一个针对性的自动评估框架,涵盖生成文本的情感倾向评估、结构化对话模式及主动询问能力。在情感倾向方面,我们计算了各模型生成的CBT对话文本的情感倾向得分。针对结构化对话模式,我们采用多种自动评估指标,比较不同模型在说话风格、话题一致性保持能力以及CBT技术运用方面的表现。在引导患者的询问能力方面,我们使用了PQA(主动提问能力)指标。我们还评估了LLM在整合CBT知识库后的CBT能力,以探究引入额外知识对增强模型CBT咨询能力的帮助。研究评估了四种在自然语言处理任务上表现优异的LLM变体,实验结果表明LLMs在心理咨询领域具有巨大潜力,尤其是在结合其他技术手段后。