Detecting depression from social media text is still a challenging task. This is due to different language styles, informal expression, and the lack of annotated data in many languages. To tackle these issues, we propose, Semi-SMDNet, a strong Semi-Supervised Multilingual Depression detection Network. It combines teacher-student pseudo-labelling, ensemble learning, and augmentation of data. Our framework uses a group of teacher models. Their predictions come together through soft voting. An uncertainty-based threshold filters out low-confidence pseudo-labels to reduce noise and improve learning stability. We also use a confidence-weighted training method that focuses on reliable pseudo-labelled samples. This greatly boosts robustness across languages. Tests on Arabic, Bangla, English, and Spanish datasets show that our approach consistently beats strong baselines. It significantly reduces the performance gap between settings that have plenty of resources and those that do not. Detailed experiments and studies confirm that our framework is effective and can be used in various situations. This shows that it is suitable for scalable, cross-language mental health monitoring where labelled resources are limited.
翻译:从社交媒体文本中检测抑郁症仍然是一项具有挑战性的任务。这源于不同的语言风格、非正式的表达方式以及许多语言中标注数据的缺乏。为了解决这些问题,我们提出了Semi-SMDNet,一个强大的半监督多语言抑郁症检测网络。它结合了师生伪标签、集成学习和数据增强技术。我们的框架使用一组教师模型,其预测通过软投票进行集成。一个基于不确定性的阈值用于过滤掉低置信度的伪标签,以减少噪声并提高学习稳定性。我们还采用了一种置信度加权的训练方法,重点关注可靠的伪标签样本。这极大地提升了跨语言的鲁棒性。在阿拉伯语、孟加拉语、英语和西班牙语数据集上的测试表明,我们的方法始终优于强基线模型。它显著缩小了资源充足与资源匮乏设置之间的性能差距。详细的实验和分析证实了我们的框架是有效的,并且可以应用于多种场景。这表明该框架适用于标注资源有限、可扩展的跨语言心理健康监测。