Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives

Growing reliance on LLMs for psychiatric self-assessment raises questions about their ability to interpret qualitative patient narratives. This depth over breadth case study directly compares state-of-the-art LLMs and mental health professionals in assessing Borderline (BPD) and Narcissistic (NPD) Personality Disorders based on Polish-language first-person autobiographical accounts. Within our sample, the overall diagnostic scores of the top-performing Gemini Pro models (65.48%) were 21.91 percentage points higher than the average scores of the human professionals (43.57%). While both models and human experts excelled at identifying BPD (F1 = 83.4 & F1 = 80.0, respectively), models severely underdiagnosed NPD (F1 = 6.7 vs. 50.0), showing a potential reluctance toward the value-laden term "narcissism." Qualitatively, models provided confident, elaborate justifications focused on patterns and formal categories, while human experts remained concise and cautious, emphasizing the patients' sense of self and temporal experience. Our findings demonstrate that while LLMs might be competent at interpreting complex first-person clinical data, their outputs still carry critical reliability and bias issues.

翻译：随着LLMs在精神科自我评估中的广泛应用，其解读定性患者叙事的能力引发关注。本研究采用深度优先于广度的案例研究方法，直接对比先进LLMs与心理健康专业人员在基于波兰语第一人称自传体叙事评估边缘型人格障碍（BPD）与自恋型人格障碍（NPD）方面的表现。在样本范围内，表现最佳的Gemini Pro模型总体诊断得分（65.48%）比人类专业人员平均得分（43.57%）高出21.91个百分点。虽然模型与人类专家在识别BPD方面均表现出色（F1值分别为83.4与80.0），但模型严重低估了NPD（F1值为6.7 vs. 50.0），显示其对带有价值判断的术语"自恋"存在潜在回避倾向。从定性角度看，模型提供自信详尽的论证，聚焦于模式与形式化分类，而人类专家则保持简洁谨慎，更强调患者的自我感知与时间体验。研究结果表明，尽管LLMs可能具备解读复杂第一人称临床数据的能力，其输出仍存在关键可靠性问题与认知偏差。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

管理 LLM 智能体中的演进式记忆：风险、机理及稳定性与安全性受控记忆（SSGM）框架

专知会员服务

16+阅读 · 3月14日

LLMs与生成式智能体模拟：复杂系统研究的新范式

专知会员服务

28+阅读 · 2025年6月15日

北大团队发布首篇大语言模型心理测量学系统综述：评估、验证、增强

专知会员服务

10+阅读 · 2025年5月27日

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

专知会员服务

14+阅读 · 2025年2月14日