Detection and Analysis of Stress-Related Posts in Reddit Acamedic Communities

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Nowadays, the significance of monitoring stress levels and recognizing early signs of mental illness cannot be overstated. Automatic stress detection in text can proactively help manage stress and protect mental well-being. In today's digital era, social media platforms reflect the psychological well-being and stress levels within various communities. This study focuses on detecting and analyzing stress-related posts in Reddit academic communities. Due to online education and remote work, these communities have become central for academic discussions and support. We classify text as stressed or not using natural language processing and machine learning classifiers, with Dreaddit as our training dataset, which contains labeled data from Reddit. Next, we collect and analyze posts from various academic subreddits. We identified that the most effective individual feature for stress detection is the Bag of Words, paired with the Logistic Regression classifier, achieving a 77.78% accuracy rate and an F1 score of 0.79 on the DReaddit dataset. This combination also performs best in stress detection on human-annotated datasets, with a 72% accuracy rate. Our key findings reveal that posts and comments in professors Reddit communities are the most stressful, compared to other academic levels, including bachelor, graduate, and Ph.D. students. This research contributes to our understanding of the stress levels within academic communities. It can help academic institutions and online communities develop measures and interventions to address this issue effectively.

翻译：如今，监测压力水平并识别心理健康早期迹象的重要性不容忽视。文本中的自动压力检测能主动帮助管理压力并维护心理健康。在当今数字时代，社交媒体平台反映了不同社区的心理健康状态与压力水平。本研究聚焦于Reddit学术社区中压力相关帖子的检测与分析。受在线教育和远程工作的影响，这些社区已成为学术讨论与支持的核心场所。我们利用自然语言处理和机器学习分类器将文本分为“有压力”或“无压力”，并以Dreaddit（包含来自Reddit的带标签数据）作为训练数据集。随后，我们收集并分析了各学术子版块中的帖子。研究发现，对于压力检测而言，最有效的单一特征是词袋模型，结合逻辑回归分类器在DReaddit数据集上实现了77.78%的准确率和0.79的F1分数。该组合在人工标注数据集上的压力检测中也表现最佳，准确率达72%。关键发现表明，与其他学术层级（包括本科生、研究生及博士生）相比，教授Reddit社区中的帖子和评论压力最大。本研究有助于深化对学术社区压力水平的理解，并能协助学术机构及在线社区制定有效措施以应对这一问题。