Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants

Warning: This paper discusses topics of suicide and suicidal ideation, which may be distressing to some readers. The National Violent Death Reporting System (NVDRS) documents information about suicides in the United States, including free text narratives (e.g., circumstances surrounding a suicide). In a demanding public health data pipeline, annotators manually extract structured information from death investigation records following extensive guidelines developed painstakingly by experts. In this work, we facilitate data-driven insights from the NVDRS data to support the development of novel suicide interventions by investigating the value of language models (LMs) as efficient assistants to these (a) data annotators and (b) experts. We find that LM predictions match existing data annotations about 85% of the time across 50 NVDRS variables. In the cases where the LM disagrees with existing annotations, expert review reveals that LM assistants can surface annotation discrepancies 38% of the time. Finally, we introduce a human-in-the-loop algorithm to assist experts in efficiently building and refining guidelines for annotating new variables by allowing them to focus only on providing feedback for incorrect LM predictions. We apply our algorithm to a real-world case study for a new variable that characterizes victim interactions with lawyers and demonstrate that it achieves comparable annotation quality with a laborious manual approach. Our findings provide evidence that LMs can serve as effective assistants to public health researchers who handle sensitive data in high-stakes scenarios.

翻译：警示：本文讨论自杀及自杀意念相关话题，可能对部分读者造成困扰。美国国家暴力死亡报告系统（NVDRS）记录了美国境内自杀事件的相关信息，包括自由文本叙述（如自杀事件发生的情境）。在要求严格的数据处理流程中，标注人员需根据专家精心制定的详细指南，从死亡调查记录中手动提取结构化信息。本研究通过探究语言模型（LMs）作为高效辅助工具对（a）数据标注人员和（b）专家两类角色的支持价值，促进从NVDRS数据中获取数据驱动的洞察，从而助力新型自杀干预措施的开发。我们发现，在涉及50个NVDRS变量的场景中，语言模型的预测结果与现有数据标注的一致率约为85%。当语言模型与现有标注存在分歧时，专家复核表明语言模型辅助工具能在38%的情况下揭示标注差异。最后，我们提出了一种人机协同算法，通过仅需专家对语言模型错误预测结果进行反馈的方式，高效协助专家构建和完善新变量的标注指南。将该算法应用于描述受害者与律师互动情况的新变量实际案例研究中，我们证明其标注质量可与耗时的人工方法相媲美。研究结果表明，语言模型可作为处理高风险场景中敏感数据的公共卫生研究人员的有效辅助工具。