To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges: \textbf{1) Can we train a highly realistic AI counselor?} Realistic counseling is a longitudinal task requiring sustained memory and dynamic goal tracking. We propose a multi-session benchmark (spanning 6-10 sessions across three distinct stages) that demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning. The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills. \textbf{2) How to train a multi-therapy AI counselor?} While existing models often focus on a single therapy, complex cases frequently require flexible strategies among various therapies. We construct a diverse dataset covering five therapeutic modalities (Psychodynamic, Behaviorism, CBT, Humanistic Existentialist, and Postmodernist) alongside an integrative therapy with a unified three-stage clinical framework across six core psychological topics. \textbf{3) How to systematically evaluate an AI counselor?} We establish a holistic evaluation framework with 18 therapy-specific and therapy-shared metrics across Client-Level and Counselor-Level dimensions. To support this, we also construct over 2,000 diverse client profiles. Extensive experimental analysis fully validates the superior quality and clinical fidelity of our dataset. Crucially, \texttt{PsychEval} transcends static benchmarking to serve as a high-fidelity reinforcement learning environment that enables the self-evolutionary training of clinically responsible and adaptive AI counselors.
翻译:为开发可靠的心理评估人工智能,我们提出\texttt{PsychEval}——一个多会话、多疗法、高真实度的基准,旨在解决三个关键挑战:\textbf{1) 能否训练出高真实感的AI咨询师?} 真实的咨询是一个纵向任务,需要持续的记忆与动态目标追踪。我们提出了一个多会话基准(跨越三个不同阶段的6-10次会话),要求具备记忆连续性、适应性推理和纵向规划等关键能力。该数据集标注了丰富的专业技能,包含超过677项元技能和4577项原子技能。\textbf{2) 如何训练多疗法的AI咨询师?} 现有模型通常专注于单一疗法,而复杂案例常需在不同疗法间灵活切换策略。我们构建了一个多样化数据集,涵盖五种治疗模式(心理动力学、行为主义、认知行为疗法、人本存在主义及后现代主义)以及一种整合疗法,并在六个核心心理主题上采用统一的三阶段临床框架。\textbf{3) 如何系统评估AI咨询师?} 我们建立了一个包含18项疗法专用及疗法共享指标的整体评估框架,覆盖来访者层面和咨询师层面两个维度。为此,我们还构建了超过2000个多样化的来访者档案。广泛的实验分析充分验证了我们数据集的高质量与临床保真度。至关重要的是,\texttt{PsychEval}超越了静态基准测试,可作为一个高保真度的强化学习环境,支持临床负责任且具备适应性的AI咨询师进行自我进化训练。