Depression is a pressing global public health issue, yet publicly available Chinese-language resources for risk detection remain scarce and are mostly limited to binary classification. To address this limitation, we release CNSocialDepress, a benchmark dataset for depression risk detection from Chinese social media posts. The dataset contains 44,178 texts from 233 users, within which psychological experts annotated 10,306 depression-related segments. CNSocialDepress provides binary risk labels together with structured multi-dimensional psychological attributes, enabling interpretable and fine-grained analysis of depressive signals. Experimental results demonstrate its utility across a wide range of NLP tasks, including structured psychological profiling and fine-tuning of large language models for depression detection. Comprehensive evaluations highlight the dataset's effectiveness and practical value for depression risk identification and psychological analysis, thereby providing insights to mental health applications tailored for Chinese-speaking populations.
翻译:抑郁症是一个紧迫的全球性公共卫生问题,然而,可用于风险检测的公开中文资源仍然稀缺,且大多局限于二元分类。为应对这一局限,我们发布了CNSocialDepress,一个用于从中文社交媒体帖子中检测抑郁风险的基准数据集。该数据集包含来自233位用户的44,178条文本,其中由心理学专家标注了10,306个与抑郁相关的片段。CNSocialDepress不仅提供二元风险标签,还包含结构化的多维度心理属性,从而支持对抑郁信号进行可解释的细粒度分析。实验结果证明了该数据集在广泛的自然语言处理任务中的实用性,包括结构化心理画像以及为抑郁检测任务微调大语言模型。全面的评估凸显了该数据集在抑郁风险识别和心理分析方面的有效性和实用价值,从而为面向中文使用者的心理健康应用提供了洞见。