The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.
翻译:COVID-19疫情加剧了全球心理健康危机,社会隔离和经济不稳定导致自杀行为增加。自杀可能源于社会因素(如羞耻、虐待、遗弃)以及心理健康问题(如抑郁症、创伤后应激障碍(PTSD)、注意力缺陷/多动症(ADHD)、焦虑障碍和双相情感障碍)。随着这些状况的发展,自杀意念的迹象可能出现在社交媒体互动中。利用人工智能(AI)技术分析社交媒体数据有助于识别自杀行为模式,为自杀预防机构、专业人员及更广泛的社区意识倡议提供宝贵见解。用于此目的的机器学习算法需要大量精确标注的数据。以往的研究尚未充分探讨在分析和标注纵向社交媒体数据时融入解释的潜力。本研究采用模型解释方法——层级集成梯度(Layer Integrated Gradients),基于微调后的最先进语言模型,为Reddit用户帖子的每个词元分配用于预测自杀意念的归因分数。通过提取和分析数据中词元的归因,我们提出了一种方法,可在推理阶段无需使用大型语言模型的情况下,对社交媒体帖子进行自杀意念的初步筛查。