Suicide remains a pressing global health concern, necessitating innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in posts from the SuicideWatch subreddit by proposing a novel deep-learning approach that utilizes the state-of-the-art RoBERTa-CNN model. The robustly Optimized BERT Pretraining Approach (RoBERTa) excels at capturing textual nuances and forming semantic relationships within the text. The remaining Convolutional Neural Network (CNN) head enhances RoBERTa's capacity to discern critical patterns from extensive datasets. To evaluate RoBERTa-CNN, we conducted experiments on the Suicide and Depression Detection dataset, yielding promising results. For instance, RoBERTa-CNN achieves a mean accuracy of 98% with a standard deviation (STD) of 0.0009. Additionally, we found that data quality significantly impacts the training of a robust model. To improve data quality, we removed noise from the text data while preserving its contextual content through either manually cleaning or utilizing the OpenAI API.
翻译:自杀仍是全球紧迫的健康问题,亟需通过创新方法实现早期检测与干预。本文聚焦于识别SuicideWatch子论坛帖子中的自杀意图,提出一种采用先进RoBERTa-CNN模型的新型深度学习方法。经过鲁棒优化的BERT预训练方法(RoBERTa)擅长捕捉文本细微差异并构建语义关联,而后续的卷积神经网络(CNN)头部增强了RoBERTa从海量数据集中识别关键模式的能力。为评估RoBERTa-CNN模型,我们在自杀与抑郁检测数据集上进行了实验,获得了显著成果。例如,RoBERTa-CNN的平均准确率达到98%,标准差(STD)为0.0009。此外,我们发现数据质量对训练鲁棒模型具有显著影响。为提升数据质量,我们通过人工清洗或调用OpenAI API的方式,在保持文本语境完整性的前提下清除了数据噪声。