We introduce VERA-MH (Validation of Ethical and Responsible AI in Mental Health), an automated evaluation of the safety of AI chatbots used in mental health contexts, with an initial focus on suicide risk. Practicing clinicians and academic experts developed a rubric informed by best practices for suicide risk management for the evaluation. To fully automate the process, we used two ancillary AI agents. A user-agent model simulates users engaging in a mental health-based conversation with the chatbot under evaluation. The user-agent role-plays specific personas with pre-defined risk levels and other features. Simulated conversations are then passed to a judge-agent who scores them based on the rubric. The final evaluation of the chatbot being tested is obtained by aggregating the scoring of each conversation. VERA-MH is actively under development and undergoing rigorous validation by mental health clinicians to ensure user-agents realistically act as patients and that the judge-agent accurately scores the AI chatbot. To date we have conducted preliminary evaluation of GPT-5, Claude Opus and Claude Sonnet using initial versions of the VERA-MH rubric and used the findings for further design development. Next steps will include more robust clinical validation and iteration, as well as refining actionable scoring. We are seeking feedback from the community on both the technical and clinical aspects of our evaluation.
翻译:我们介绍了VERA-MH(心理健康领域人工智能伦理与责任验证),这是一种用于评估心理健康场景下AI聊天机器人安全性的自动化方法,其初期重点聚焦于自杀风险。执业临床医生和学术专家依据自杀风险管理的最佳实践制定了一套评估标准。为实现流程的完全自动化,我们使用了两个辅助AI智能体。一个用户智能体模型模拟用户与被评估聊天机器人进行基于心理健康的对话。该用户智能体扮演具有预设风险等级及其他特征的具体人物角色。模拟对话随后被传递给一个评判智能体,后者依据评估标准对其进行打分。通过汇总每次对话的评分,获得对被测试聊天机器人的最终评估。VERA-MH目前正处于积极开发阶段,并正接受心理健康临床医生的严格验证,以确保用户智能体能真实模拟患者行为,且评判智能体能准确评估AI聊天机器人。迄今为止,我们已使用VERA-MH评估标准的初始版本对GPT-5、Claude Opus和Claude Sonnet进行了初步评估,并利用评估结果进一步指导设计开发。后续步骤将包括更稳健的临床验证与迭代,以及完善可操作的评分体系。我们正在就评估的技术和临床方面寻求社区的反馈。