TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six general-purpose LLMs and six specialized mental health models. Experimental results indicate that the evaluated models underperform across various trustworthiness dimensions in mental health scenarios, revealing significant deficiencies. Notably, even generally powerful models (e.g., GPT-5.1) fail to maintain consistently high performance across all dimensions. Consequently, systematically improving the trustworthiness of LLMs has become a critical task. Our data and code are released.

翻译：尽管大型语言模型（LLMs）在提供可及的心理健康支持方面展现出巨大潜力，但由于该领域的高风险与安全敏感性，其实际部署引发了关键的可信度担忧。现有的通用大型语言模型评估范式未能涵盖心理健康领域的特定需求，这突显了优先提升其可信度的紧迫性。为此，我们提出了TrustMH-Bench，一个旨在系统量化心理健康领域大型语言模型可信度的整体框架。通过建立从领域特定规范到量化评估指标的深度映射，TrustMH-Bench从八个核心维度评估模型：可靠性、危机识别与升级、安全性、公平性、隐私性、鲁棒性、抗奉承性与伦理性。我们对六个通用大型语言模型和六个专业心理健康模型进行了广泛实验。实验结果表明，所评估的模型在心理健康场景下的各项可信度维度上均表现欠佳，存在显著不足。值得注意的是，即使是普遍性能强大的模型（例如GPT-5.1）也无法在所有维度上保持持续的高水平表现。因此，系统性地提升大型语言模型的可信度已成为一项关键任务。我们的数据与代码均已开源。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

基于大语言模型的医疗推理研究：综述与 MR-Bench 基准测试

专知会员服务

15+阅读 · 4月13日

【斯坦福博士论文】提升大语言模型知识获取的可信度

专知会员服务

24+阅读 · 3月7日

【斯坦福博士论文】大语言模型的AI辅助评估

专知会员服务

31+阅读 · 2025年3月30日

迈向可信的人工智能：伦理和稳健的大型语言模型综述

专知会员服务

39+阅读 · 2024年7月28日