Preference Learning Unlocks LLMs' Psycho-Counseling Skills

Applying large language models (LLMs) to assist in psycho-counseling is an emerging and meaningful approach, driven by the significant gap between patient needs and the availability of mental health support. However, current LLMs struggle to consistently provide effective responses to client speeches, largely due to the lack of supervision from high-quality real psycho-counseling data, whose content is typically inaccessible due to client privacy concerns. Furthermore, the quality of therapists' responses in available sessions can vary significantly based on their professional training and experience. Assessing the quality of therapists' responses remains an open challenge. In this work, we address these challenges by first proposing a set of professional and comprehensive principles to evaluate therapists' responses to client speeches. Using these principles, we create a preference dataset, PsychoCounsel-Preference, which contains 36k high-quality preference comparison pairs. This dataset aligns with the preferences of professional psychotherapists, providing a robust foundation for evaluating and improving LLMs in psycho-counseling. Experiments on reward modeling and preference learning demonstrate that PsychoCounsel-Preference is an excellent resource for LLMs to acquire essential skills for responding to clients in a counseling session. Our best-aligned model, PsychoCounsel-Llama3-8B, achieves an impressive win rate of 87% against GPT-4o. We release PsychoCounsel-Preference, PsychoCounsel-Llama3-8B and the reward model PsychoCounsel Llama3-8B-Reward to facilitate the research of psycho-counseling with LLMs at: https://hf.co/Psychotherapy-LLM.

翻译：应用大语言模型（LLMs）辅助心理咨询是一种新兴且富有意义的方法，其动力来源于患者需求与心理健康支持可用性之间的显著差距。然而，当前LLMs难以持续对来访者话语提供有效回应，主要原因在于缺乏高质量真实心理咨询数据的监督——这些数据内容通常因来访者隐私问题而无法获取。此外，现有访谈中治疗师回应的质量会因其专业训练和经验水平而产生显著差异。评估治疗师回应质量仍是一项开放性挑战。在本研究中，我们首先提出一套专业且全面的原则来评估治疗师对来访者话语的回应，从而应对这些挑战。基于这些原则，我们构建了一个偏好数据集PsychoCounsel-Preference，包含3.6万个高质量偏好比较对。该数据集与专业心理治疗师的偏好保持一致，为评估和改进LLMs在心理咨询中的表现提供了坚实基础。奖励建模与偏好学习实验表明，PsychoCounsel-Preference是LLMs获得咨询会谈中对来访者回应所需基本技能的优质资源。我们最佳对齐模型PsychoCounsel-Llama3-8B在与GPT-4o的对比中取得了87%的惊人胜率。我们已发布PsychoCounsel-Preference、PsychoCounsel-Llama3-8B以及奖励模型PsychoCounsel-Llama3-8B-Reward，以促进基于LLMs的心理咨询研究，链接地址为：https://hf.co/Psychotherapy-LLM。