Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using binary relevance (acc=0.86, F1=0.78). MentalBERT (F1=0.74) also exceeded BioClinicalBERT (F1=0.72). RoBERTa fine-tuned with a single multi-label classifier further improved performance (acc=0.88, F1=0.81), highlighting that models pre-trained on domain-relevant data and the single multi-label classification strategy enhance efficiency and performance. Keywords: EHR-based Phynotyping; Natural Language Processing; Secondary Use of EHR Data; Suicide Classification; BERT-based Model; Psychiatry; Mental Health
翻译:准确识别和分类自杀事件能够改善自杀预防措施,减轻操作负担,并提升高敏锐度精神病治疗环境中的护理质量。预训练语言模型为从非结构化临床叙述中识别自杀倾向提供了潜力。我们评估了四种基于BERT的模型使用两种微调策略(多单标签与单多标签)的性能,用于从500份已标注的精神病评估记录中检测共存的自杀事件。这些记录针对自杀意念(SI)、自杀企图(SA)、自杀暴露(ES)和非自杀性自伤(NSSI)进行了标注。RoBERTa在使用二元关联性方法时优于其他模型(准确率=0.86,F1分数=0.78)。MentalBERT(F1=0.74)也超过了BioClinicalBERT(F1=0.72)。使用单多标签分类器微调的RoBERTa进一步提升了性能(准确率=0.88,F1分数=0.81),这表明基于领域相关数据预训练的模型以及单多标签分类策略能够提高效率和性能。关键词:基于电子健康记录的表型分析;自然语言处理;电子健康记录数据的二次利用;自杀分类;基于BERT的模型;精神病学;心理健康