Mental health problems such as anxiety, depression, and suicide remain urgent global challenges, where timely and accurate assessment is critical for effective intervention. Recently, large language models have been explored for mental health assessment. However, existing general-purpose post-training methods do not align with the cognitive processes of human assessment, which may lead to unreliable reasoning outcomes. To bridge this gap, we propose Cognitive Relative Policy Optimization (CRPO), a reinforcement learning framework tailored for the mental health domain. CRPO extends group relative policy optimization by integrating stage-dependent uncertainty modeling into the policy optimization process. Specifically, we introduce a stage-wise entropy regularization mechanism that encourages broad exploration in early reasoning phases and progressively enforces confident decision-making in later stages, mimicking the human cognitive shift from uncertainty to certainty. In addition, inspired by cognitive appraisal theory, we formalize cognitive reasoning stages, thereby guiding theory-grounded interpretable inference. Experiments on 8 mental health datasets show that CRPO achieves an average improvement of 10.4 percentage points in weighted F1-score over the best reinforcement learning baseline. Furthermore, the CRPO-trained model Mental-R1 demonstrates clear advantages compared with existing large language models on reasoning-intensive cases, suggesting that CRPO enhances reasoning capabilities for mental health assessment.
翻译:焦虑、抑郁和自杀等心理健康问题仍是全球性紧迫挑战,及时准确的评估对有效干预至关重要。近期,大语言模型已被探索用于心理健康评估。然而,现有的通用后训练方法未能与人类评估的认知过程对齐,可能导致不可靠的推理结果。为弥合这一差距,我们提出认知相对策略优化(CRPO),这是一种专为心理健康领域设计的强化学习框架。CRPO通过将阶段依赖的不确定性建模融入策略优化过程,扩展了群体相对策略优化。具体而言,我们引入阶段式熵正则化机制,鼓励早期推理阶段的广泛探索,并在后续阶段逐步强化自信决策,模拟人类从不确定到确定的认知转变。此外,受认知评价理论启发,我们形式化推理的认知阶段,从而指导基于理论的、可解释的推理过程。在8个心理健康数据集上的实验表明,CRPO的加权F1分数相比最优强化学习基线平均提升10.4个百分点。更进一步,CRPO训练的模型Mental-R1在推理密集型案例上展现出相较于现有大语言模型的显著优势,这表明CRPO增强了心理健康评估的推理能力。