While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six general-purpose LLMs and six specialized mental health models. Experimental results indicate that the evaluated models underperform across various trustworthiness dimensions in mental health scenarios, revealing significant deficiencies. Notably, even generally powerful models (e.g., GPT-5.1) fail to maintain consistently high performance across all dimensions. Consequently, systematically improving the trustworthiness of LLMs has become a critical task. Our data and code are released.
翻译:尽管大型语言模型(LLMs)在提供可及的心理健康支持方面展现出巨大潜力,但由于该领域的高风险与安全敏感性,其实际部署引发了关键的可信度担忧。现有的通用大型语言模型评估范式未能涵盖心理健康领域的特定需求,这突显了优先提升其可信度的紧迫性。为此,我们提出了TrustMH-Bench,一个旨在系统量化心理健康领域大型语言模型可信度的整体框架。通过建立从领域特定规范到量化评估指标的深度映射,TrustMH-Bench从八个核心维度评估模型:可靠性、危机识别与升级、安全性、公平性、隐私性、鲁棒性、抗奉承性与伦理性。我们对六个通用大型语言模型和六个专业心理健康模型进行了广泛实验。实验结果表明,所评估的模型在心理健康场景下的各项可信度维度上均表现欠佳,存在显著不足。值得注意的是,即使是普遍性能强大的模型(例如GPT-5.1)也无法在所有维度上保持持续的高水平表现。因此,系统性地提升大型语言模型的可信度已成为一项关键任务。我们的数据与代码均已开源。