Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health natural language processing (NLP) has progressed substantially, Arabic multi-class disorder classification remains insufficiently studied. This study proposes a two-phase framework for Arabic mental health text classification. In phase 1, three Arabic pre-trained language models, AraBERT, CAMeLBERT, and MARBERT, undergo Domain-Adaptive and Task-Adaptive Pretraining (DAPT and TAPT) using a large-scale corpus of unlabeled Arabic mental health tweets. The adapted models are evaluated under a unified protocol to identify the most effective backbone model. In phase 2, the selected model is assessed across four configurations combining single-stage and hierarchical two-stage classification architectures with full fine-tuning and Low-Rank Adaptation (LoRA). To support this study, we constructed a novel annotated Arabic mental health dataset comprising 50,670 tweets across six categories, with strong inter annotator agreement (Krippendorff's Alpha = 0.733, average pairwise agreement = 0.797). Experimental results show that the domain-adapted MARBERT (MentalMARBERT) achieves statistically significant improvements over baseline models in both accuracy and macro-F1. The hierarchical two-stage architecture combined with full fine-tuning achieves the best overall performance, reaching a macro-F1 of 0.861 and an accuracy of 0.877. These findings demonstrate the effectiveness of domain-specific adaptive pretraining and hierarchical classification for Arabic mental health disorder detection.
翻译:从阿拉伯语社交媒体文本中检测心理健康障碍仍面临方言变异、非正式语言、高质量标注资源有限以及严重的类别不平衡等挑战。尽管英语心理健康自然语言处理(NLP)已取得显著进展,阿拉伯语多类别障碍分类仍研究不足。本研究提出面向阿拉伯语心理健康文本分类的两阶段框架。第一阶段中,三种阿拉伯语预训练语言模型(AraBERT、CAMeLBERT和MARBERT)通过大规模未标注阿拉伯语心理健康推文库完成领域自适应预训练(DAPT)与任务自适应预训练(TAPT);采用统一协议评估各模型以确定最优骨干模型。第二阶段中,选定模型在四种配置下进行验证,包括单阶段与层级化两阶段分类架构,结合全参数微调与低秩适配(LoRA)。为支持本研究,我们构建了包含50,670条推文(涵盖六个类别)的新型阿拉伯语心理健康标注数据集,标注者间一致性较高(Krippendorff α系数=0.733,平均成对一致性=0.797)。实验结果表明,领域自适应MARBERT模型(MentalMARBERT)在准确率和宏F1分数上均较基线模型取得统计显著提升。层级化两阶段架构结合全参数微调实现最优整体性能,宏F1分数达0.861,准确率达0.877。这些发现证实了领域特异性自适应预训练与层级分类方法在阿拉伯语心理健康障碍检测中的有效性。