Towards Efficient and Robust Linguistic Emotion Diagnosis for Mental Health via Multi-Agent Instruction Refinement

Linguistic expressions of emotions such as depression, anxiety, and trauma-related states are pervasive in clinical notes, counseling dialogues, and online mental health communities, and accurate recognition of these emotions is essential for clinical triage, risk assessment, and timely intervention. Although large language models (LLMs) have demonstrated strong generalization ability in emotion analysis tasks, their diagnostic reliability in high-stakes, context-intensive medical settings remains highly sensitive to prompt design. Moreover, existing methods face two key challenges: emotional comorbidity, in which multiple intertwined emotional states complicate prediction, and inefficient exploration of clinically relevant cues. To address these challenges, we propose APOLO (Automated Prompt Optimization for Linguistic Emotion Diagnosis), a framework that systematically explores a broader and finer-grained prompt space to improve diagnostic efficiency and robustness. APOLO formulates instruction refinement as a Partially Observable Markov Decision Process and adopts a multi-agent collaboration mechanism involving Planner, Teacher, Critic, Student, and Target roles. Within this closed-loop framework, the Planner defines an optimization trajectory, while the Teacher-Critic-Student agents iteratively refine prompts to enhance reasoning stability and effectiveness, and the Target agent determines whether to continue optimization based on performance evaluation. Experimental results show that APOLO consistently improves diagnostic accuracy and robustness across domain-specific and stratified benchmarks, demonstrating a scalable and generalizable paradigm for trustworthy LLM applications in mental healthcare.

翻译：抑郁、焦虑及创伤相关状态等情感的语言表达广泛存在于临床记录、咨询对话及在线心理健康社区中，准确识别这些情感对于临床分诊、风险评估和及时干预至关重要。尽管大语言模型在情感分析任务中展现出强大的泛化能力，但在高风险、强语境的医疗场景中，其诊断可靠性仍高度依赖于提示设计。此外，现有方法面临两大关键挑战：情感共病——即多种交织的情感状态使预测复杂化，以及对临床相关线索的低效探索。为应对这些挑战，我们提出APOLO（面向语言情感诊断的自动化提示优化框架），该框架通过系统探索更广且更细粒度的提示空间以提升诊断效率与鲁棒性。APOLO将指令优化建模为部分可观测马尔可夫决策过程，并采用包含规划者、教师、评论者、学生与目标角色的多智能体协作机制。在此闭环框架中，规划者定义优化轨迹，教师-评论者-学生智能体迭代优化提示以增强推理稳定性与有效性，目标智能体则依据性能评估决定是否继续优化。实验结果表明，APOLO在领域特定及分层基准测试中持续提升诊断准确率与鲁棒性，为心理健康领域可信赖的大语言模型应用提供了可扩展且可泛化的范式。