The growing demand for scalable psychological counseling highlights the need for high-quality, privacy-compliant data, yet such data remains scarce. Here we introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation that decomposes counselor response generation into coordinated sub-tasks handled by specialized LLM agents, each modeling a key psychological technique. Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling. We further propose a unified evaluation framework that consolidates diverse automatic metrics and expands expert assessment from four to nine counseling dimensions, thus addressing inconsistencies in prior evaluation protocols. Empirically, MAGneT substantially outperforms existing methods: experts prefer MAGneT-generated sessions in 77.2% of cases, and sessions generated by MAGneT yield 3.2% higher general counseling skills and 4.3% higher CBT-specific skills on cognitive therapy rating scale (CTRS). A open source Llama3-8B-Instruct model fine-tuned on MAGneT-generated data also outperforms models fine-tuned using baseline synthetic datasets by 6.9% on average on CTRS.We also make our code and data public.
翻译:日益增长的可扩展心理咨询需求凸显了对高质量、符合隐私要求数据的需求,然而此类数据仍然稀缺。本文介绍MAGneT,一种用于生成合成心理咨询会话的新型多智能体框架,该框架将咨询师响应生成分解为由专业化大语言模型智能体协同处理的子任务,每个智能体模拟一种关键的心理咨询技术。与先前单智能体方法不同,MAGneT能更好地捕捉真实咨询的结构与细微差别。我们进一步提出一个统一的评估框架,该框架整合了多样化的自动评估指标,并将专家评估维度从四个扩展到九个心理咨询维度,从而解决了先前评估方案中的不一致性问题。实证结果表明,MAGneT显著优于现有方法:在77.2%的案例中专家更偏好MAGneT生成的会话,且MAGneT生成的会话在认知治疗评定量表(CTRS)上显示出高出3.2%的一般咨询技能和4.3%的认知行为疗法专项技能。基于MAGneT生成数据微调的开源Llama3-8B-Instruct模型,在CTRS上的平均表现也优于使用基线合成数据集微调的模型6.9%。我们同时公开了代码与数据。