Social media platforms have revolutionized traditional communication techniques by enabling people globally to connect instantaneously, openly, and frequently. People use social media to share personal stories and express their opinion. Negative emotions such as thoughts of death, self-harm, and hardship are commonly expressed on social media, particularly among younger generations. As a result, using social media to detect suicidal thoughts will help provide proper intervention that will ultimately deter others from self-harm and committing suicide and stop the spread of suicidal ideation on social media. To investigate the ability to detect suicidal thoughts in Arabic tweets automatically, we developed a novel Arabic suicidal tweets dataset, examined several machine learning models, including Na\"ive Bayes, Support Vector Machine, K-Nearest Neighbor, Random Forest, and XGBoost, trained on word frequency and word embedding features, and investigated the ability of pre-trained deep learning models, AraBert, AraELECTRA, and AraGPT2, to identify suicidal thoughts in Arabic tweets. The results indicate that SVM and RF models trained on character n-gram features provided the best performance in the machine learning models, with 86% accuracy and an F1 score of 79%. The results of the deep learning models show that AraBert model outperforms other machine and deep learning models, achieving an accuracy of 91\% and an F1-score of 88%, which significantly improves the detection of suicidal ideation in the Arabic tweets dataset. To the best of our knowledge, this is the first study to develop an Arabic suicidality detection dataset from Twitter and to use deep-learning approaches in detecting suicidality in Arabic posts.
翻译:社交媒体平台通过让人们能够在全球范围内即时、公开且频繁地连接,彻底改变了传统的交流方式。人们利用社交媒体分享个人故事并表达观点。负面情绪,如关于死亡、自残和困境的想法,经常在社交媒体上表达,尤其是在年轻一代中。因此,利用社交媒体检测自杀想法有助于提供适当的干预措施,最终阻止他人自残和自杀,并遏制自杀意念在社交媒体上的传播。为了研究自动检测阿拉伯语推文中自杀想法的能力,我们开发了一个新颖的阿拉伯语自杀推文数据集,检验了多种机器学习模型,包括朴素贝叶斯、支持向量机、K近邻、随机森林和XGBoost,这些模型基于词频和词嵌入特征进行训练,并探究了预训练深度学习模型(AraBERT、AraELECTRA和AraGPT2)识别阿拉伯语推文中自杀想法的能力。结果表明,基于字符n-gram特征训练的SVM和RF模型在机器学习模型中表现最佳,准确率为86%,F1分数为79%。深度学习模型的结果显示,AraBERT模型优于其他机器学习和深度学习模型,实现了91%的准确率和88%的F1分数,显著提升了阿拉伯语推文数据集中自杀意念的检测能力。据我们所知,这是首个从Twitter开发阿拉伯语自杀检测数据集并使用深度学习方法检测阿拉伯语帖子中自杀倾向的研究。