The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.
翻译:新型冠状病毒肺炎疫情加剧了全球心理健康危机,社交隔离与经济不稳定导致自杀行为增加。自杀可能源于社会因素(如羞耻、虐待、遗弃)及心理健康问题(如抑郁症、创伤后应激障碍、注意缺陷/多动障碍、焦虑症和双相情感障碍)。随着这些状况的发展,自杀意念的征兆可能在社交媒体互动中显现。运用人工智能技术分析社交媒体数据,有助于识别自杀行为模式,为自杀预防机构、专业人员及更广泛的社区意识倡议提供宝贵见解。用于此目的的机器学习算法需大量精准标注的数据。现有研究尚未充分探索在分析及标注纵向社交媒体数据中融入解释的潜力。本研究在微调后的先进语言模型基础上,采用模型解释方法——层集成梯度,为Reddit用户帖子的每个词元分配预测自杀意念的归因分数。通过提取并分析数据中词元的归因特征,我们提出了一种无需在推理阶段使用大语言模型即可对社交媒体帖子进行自杀意念初步筛查的方法论。