Depression is a pressing global public health issue, yet publicly available Chinese-language resources for depression risk detection remain scarce and largely focus on binary classification. To address this limitation, we release CNSocialDepress, a benchmark dataset for depression risk detection on Chinese social media. The dataset contains 44,178 posts from 233 users; psychological experts annotated 10,306 depression-related segments. CNSocialDepress provides binary risk labels along with structured, multidimensional psychological attributes, enabling interpretable and fine-grained analyses of depressive signals. Experimental results demonstrate the dataset's utility across a range of NLP tasks, including structured psychological profiling and fine-tuning large language models for depression detection. Comprehensive evaluations highlight the dataset's effectiveness and practical value for depression risk identification and psychological analysis, thereby providing insights for mental health applications tailored to Chinese-speaking populations.
翻译:抑郁症是一个紧迫的全球性公共卫生问题,然而,可用于抑郁症风险检测的中文公开资源依然稀缺,且主要集中在二分类任务上。为解决这一局限,我们发布了CNSocialDepress,一个用于中文社交媒体抑郁症风险检测的基准数据集。该数据集包含来自233名用户的44,178篇帖子;心理学专家对其中10,306个与抑郁相关的片段进行了标注。CNSocialDepress提供了二分类风险标签以及结构化的多维心理属性,从而能够对抑郁信号进行可解释的细粒度分析。实验结果表明,该数据集在多种NLP任务中具有实用价值,包括结构化心理画像以及微调大型语言模型以进行抑郁检测。全面的评估凸显了该数据集在抑郁症风险识别和心理分析方面的有效性和实际价值,从而为面向中文人群的心理健康应用提供了洞见。