Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous "split-then-balance" pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conducted a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a zero-shot LLM baseline. Grounded in a comprehensive ethical analysis, we frame the system as a human-in-the-loop screening aid, not a diagnostic tool. To support this, we introduce a hybrid SHAPLLM explainability framework and present a prototype dashboard ("Social Media Screener") designed to integrate model predictions and their explanations into a practical workflow for moderators. Our work provides a robust baseline, highlighting future needs for multi-label, clinically-validated datasets at the critical intersection of online safety and computational mental health.
翻译:心理健康问题与网络欺凌在数字空间中日益普遍,亟需可扩展且可解释的检测系统。本文提出了一种统一的多类别分类框架,用于从社交媒体数据中检测十种不同的心理健康与网络欺凌类别。我们整合了来自Twitter和Reddit的数据集,采用严格的"先划分后平衡"流程,在平衡数据上进行训练,同时在保留的真实不平衡测试集上进行评估。通过系统比较传统词法模型、混合方法及多种端到端微调Transformer模型,我们发现端到端微调对性能提升至关重要。其中领域自适应模型MentalBERT表现最优,准确率达到0.92,宏观F1分数为0.76,超越了其通用版本及零样本大语言模型基线。基于全面的伦理分析,我们将该系统定位为"人在回路"的筛查辅助工具而非诊断工具。为此,我们提出了混合SHAP-LLM可解释性框架,并开发了原型仪表板("社交媒体筛查器"),旨在将模型预测及其解释整合到审核人员的实际工作流程中。本研究建立了稳健的基线,强调未来需要在网络安全与计算心理健康的关键交叉领域,构建多标签、经临床验证的数据集。