We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.
翻译:我们提出了一个全面的评估框架,用于评估大型语言模型(LLMs)在自杀预防方面的能力,重点关注两个关键方面:隐性自杀意念识别(IIS)和提供适当支持性回应(PAS)。我们引入了\ourdata,这是一个基于D/S-IAT和负性自动思维等心理学框架以及真实世界场景构建的、包含1,308个测试用例的新型数据集。通过对8个广泛使用的LLM在不同情境设置下进行大量实验,我们发现当前模型在检测隐性自杀意念和提供适当支持方面存在显著困难,突显了将LLM应用于心理健康领域的关键局限性。我们的研究结果强调了为敏感的心理应用开发和评估LLM需要更复杂的方法。