Deep Temporal Modelling of Clinical Depression through Social Media Text

We describe the development of a model to detect user-level clinical depression based on a user's temporal social media posts. Our model uses a Depression Symptoms Detection (DSD) classifier, which is trained on the largest existing samples of clinician annotated tweets for clinical depression symptoms. We subsequently use our DSD model to extract clinically relevant features, e.g., depression scores and their consequent temporal patterns, as well as user posting activity patterns, e.g., quantifying their ``no activity'' or ``silence.'' Furthermore, to evaluate the efficacy of these extracted features, we create three kinds of datasets including a test dataset, from two existing well-known benchmark datasets for user-level depression detection. We then provide accuracy measures based on single features, baseline features and feature ablation tests, at several different levels of temporal granularity. The relevant data distributions and clinical depression detection related settings can be exploited to draw a complete picture of the impact of different features across our created datasets. Finally, we show that, in general, only semantic oriented representation models perform well. However, clinical features may enhance overall performance provided that the training and testing distribution is similar, and there is more data in a user's timeline. The consequence is that the predictive capability of depression scores increase significantly while used in a more sensitive clinical depression detection settings.

翻译：我们描述了一种基于用户时间序列社交媒体帖子来检测个体层面临床抑郁症的模型开发过程。该模型采用抑郁症状检测（DSD）分类器，该分类器基于现有最大的临床医生标注推文样本进行训练，专门针对临床抑郁症状。随后，我们利用DSD模型提取临床相关特征，例如抑郁评分及其衍生出的时间模式，以及用户发帖活动模式（如量化其“无活动”或“沉默期”）。为评估这些提取特征的有效性，我们基于两个现有且著名的个体层面抑郁症检测基准数据集，构建了包含测试数据集在内的三种类型数据集。接着，我们在多个不同时间粒度层级上，基于单一特征、基准特征及特征消融测试提供了准确率指标。通过关联相关数据分布与临床抑郁检测设置，可全面刻画各特征在不同数据集上的影响。最终研究表明：通常仅语义导向的表征模型表现良好；但若训练集与测试集分布相似且用户时间线内数据量充足时，临床特征可提升整体性能。其结果是，在更敏感的临床抑郁检测设置中，抑郁评分的预测能力显著增强。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。