LLM-based client simulation has emerged as a promising tool for training novice counselors and evaluating automated counseling systems. However, existing client simulation approaches face three key challenges: (1) limited diversity and realism in client profiles, (2) the lack of a principled framework for modeling realistic client behaviors, and (3) a scarcity in Chinese-language settings. To address these limitations, we propose PsyCLIENT, a novel simulation framework grounded in conversational trajectory modeling. By conditioning LLM generation on predefined real-world trajectories that incorporate explicit behavior labels and content constraints, our approach ensures diverse and realistic interactions. We further introduce PsyCLIENT-CP, the first open-source Chinese client profile dataset, covering 60 distinct counseling topics. Comprehensive evaluations involving licensed professional counselors demonstrate that PsyCLIENT significantly outperforms baselines in terms of authenticity and training effectiveness. Notably, the simulated clients are nearly indistinguishable from human clients, achieving an about 95\% expert confusion rate in discrimination tasks. These findings indicate that conversational trajectory modeling effectively bridges the gap between theoretical client profiles and dynamic, realistic simulations, offering a robust solution for mental health education and research. Code and data will be released to facilitate future research in mental health counseling.
翻译:基于大语言模型(LLM)的来访者模拟已成为培训新手咨询师和评估自动化咨询系统的有前景的工具。然而,现有的来访者模拟方法面临三个关键挑战:(1)来访者画像的多样性和真实性有限,(2)缺乏对现实来访者行为进行建模的原则性框架,以及(3)在中文语境下的资源稀缺。为应对这些局限,我们提出了PsyCLIENT,一个基于对话轨迹建模的新型模拟框架。通过将LLM的生成过程约束在预定义的、包含明确行为标签和内容约束的现实世界对话轨迹上,我们的方法确保了多样且真实的交互。我们进一步推出了PsyCLIENT-CP,首个开源的中文来访者画像数据集,涵盖60个不同的咨询主题。由持牌专业咨询师参与的全面评估表明,PsyCLIENT在真实性和训练效果方面显著优于基线方法。值得注意的是,模拟的来访者几乎与真实人类来访者无法区分,在辨别任务中达到了约95%的专家混淆率。这些发现表明,对话轨迹建模有效地弥合了理论上的来访者画像与动态、真实的模拟之间的差距,为心理健康教育与研究提供了一个稳健的解决方案。代码与数据将被公开,以促进未来心理健康咨询领域的研究。