Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this article, we present MindChat, a privacy-preserving LLM for mental health support, together with MindCorpus, a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework. To synthesize high-quality counseling data, the developed dialogue-construction framework employs a dual closed-loop feedback design to integrate psychological expertise and counseling techniques through role-playing: (i) turn-level critique-and-revision to improve coherence and counseling appropriateness within a session, and (ii) session-level strategy refinement to progressively enrich counselor behaviors across sessions. To mitigate privacy risks under decentralized data ownership, we fine-tune the base model using federated learning with parameter-efficient LoRA adapters and incorporate differentially private optimization to reduce membership and memorization risks. Experiments on synthetic-data quality assessment and counseling capability evaluation show that MindCorpus improves training effectiveness and that MindChat is competitive with existing general and counseling-oriented LLM baselines under both automatic LLM-judge and human evaluation protocols, while exhibiting reduced privacy leakage under membership inference attacks.
翻译:大语言模型在心理健康支持方面展现出潜力,但此类模型的训练受限于真实咨询对话的稀缺性和敏感性。本文提出MindChat,一种用于心理健康支持的隐私保护大语言模型,同时构建了MindCorpus——一个通过多智能体角色扮演框架合成的多轮次咨询数据集。为合成高质量咨询数据,所开发的对话构建框架采用双闭环反馈设计,通过角色扮演整合心理学专业知识和咨询技巧:(i)轮次层面的批判与修订以提升单次会话的连贯性与咨询适宜性;(ii)会话层面的策略优化以渐进式丰富跨会话的咨询师行为模式。为缓解分散数据所有权下的隐私风险,我们采用参数高效的LoRA适配器进行联邦学习微调基础模型,并结合差分隐私优化以降低成员推断与记忆风险。在合成数据质量评估与咨询能力评估上的实验表明,MindCorpus提升了训练有效性,且MindChat在自动LLM评判与人工评估协议下均能与现有通用及咨询导向的LLM基线模型竞争,同时在成员推断攻击下表现出更低的隐私泄露风险。