The increasing frequency of suicidal thoughts highlights the importance of early detection and intervention. Social media platforms, where users often share personal experiences and seek help, could be utilized to identify individuals at risk. However, the large volume of daily posts makes manual review impractical. This paper explores the use of Large Language Models (LLMs) to automatically detect suicidal content in text-based social media posts. We propose a novel method for generating pseudo-labels for unlabeled data by prompting LLMs, along with traditional classification fine-tuning techniques to enhance label accuracy. To create a strong suicide detection model, we develop an ensemble approach involving prompting with Qwen2-72B-Instruct, and using fine-tuned models such as Llama3-8B, Llama3.1-8B, and Gemma2-9B. We evaluate our approach on the dataset of the Suicide Ideation Detection on Social Media Challenge, a track of the IEEE Big Data 2024 Big Data Cup. Additionally, we conduct a comprehensive analysis to assess the impact of different models and fine-tuning strategies on detection performance. Experimental results show that the ensemble model significantly improves the detection accuracy, by 5% points compared with the individual models. It achieves a weight F1 score of 0.770 on the public test set, and 0.731 on the private test set, providing a promising solution for identifying suicidal content in social media. Our analysis shows that the choice of LLMs affects the prompting performance, with larger models providing better accuracy. Our code and checkpoints are publicly available at https://github.com/khanhvynguyen/Suicide_Detection_LLMs.
翻译:自杀意念的日益频发凸显了早期检测与干预的重要性。社交媒体平台作为用户常分享个人经历和寻求帮助的场所,可用于识别潜在风险个体。然而,每日海量的发帖量使得人工审核难以实施。本文探索利用大型语言模型(LLMs)自动检测基于文本的社交媒体内容中的自杀倾向。我们提出一种新颖方法,通过提示LLMs为未标注数据生成伪标签,并结合传统的分类微调技术以提升标签准确性。为构建强效的自杀检测模型,我们开发了一种集成方法,涉及使用Qwen2-72B-Instruct进行提示,并采用微调模型如Llama3-8B、Llama3.1-8B和Gemma2-9B。我们在IEEE Big Data 2024大数据杯赛道——社交媒体自杀意念检测挑战赛的数据集上评估了我们的方法。此外,我们进行了全面分析,以评估不同模型和微调策略对检测性能的影响。实验结果表明,集成模型显著提升了检测准确率,较单一模型提高了5个百分点。在公开测试集上获得了0.770的加权F1分数,在私有测试集上获得了0.731的分数,为识别社交媒体中的自杀内容提供了一个有前景的解决方案。我们的分析表明,LLMs的选择影响提示性能,更大规模的模型能提供更好的准确性。我们的代码和检查点已公开于https://github.com/khanhvynguyen/Suicide_Detection_LLMs。