The increasing prevalence of mental disorders globally highlights the urgent need for effective digital screening methods that can be used in multilingual contexts. Most existing studies, however, focus on English data, overlooking critical mental health signals that may be present in non-English texts. To address this gap, we present a survey of the detection of mental disorders using social media data beyond the English language. We compile a comprehensive list of 108 datasets spanning 25 languages that can be used for developing NLP models for mental health screening. In addition, we discuss the cultural nuances that influence online language patterns and self-disclosure behaviors, and how these factors can impact the performance of NLP tools. Our survey highlights major challenges, including the scarcity of resources for low- and mid-resource languages and the dominance of depression-focused data over other disorders. By identifying these gaps, we advocate for interdisciplinary collaborations and the development of multilingual benchmarks to enhance mental health screening worldwide.
翻译:全球精神障碍患病率的日益上升凸显了在多种语言环境下应用有效数字筛查方法的迫切需求。然而,现有研究大多集中于英语数据,忽视了非英语文本中可能存在的关键心理健康信号。为弥补这一空白,本文对使用英语以外社交媒体数据进行精神障碍检测的研究进行了系统性综述。我们整理了涵盖25种语言的108个数据集清单,这些数据集可用于开发心理健康筛查的自然语言处理模型。此外,我们探讨了影响网络语言模式和自我表露行为的文化细微差异,以及这些因素如何影响自然语言处理工具的性能。本综述重点指出了当前面临的主要挑战,包括中低资源语言数据稀缺,以及抑郁症相关数据相较于其他障碍类型占据主导地位等问题。通过识别这些研究空白,我们倡导开展跨学科合作并建立多语言基准测试体系,以提升全球心理健康筛查水平。