Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce a novel dataset named DepressionEmo designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts. This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models and validating the quality by annotators and ChatGPT, exhibiting an acceptable level of interrater reliability between annotators. The correlation between emotions, their distribution over time, and linguistic analysis are conducted on DepressionEmo. Besides, we provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART. The pretrained BART model, bart-base allows us to obtain the highest F1- Macro of 0.76, showing its outperformance compared to other methods evaluated in our analysis. Across all emotions, the highest F1-Macro value is achieved by suicide intent, indicating a certain value of our dataset in identifying emotions in individuals with depression symptoms through text analysis. The curated dataset is publicly available at: https://github.com/abuBakarSiddiqurRahman/DepressionEmo.
翻译:情绪是人类社会交往的核心组成部分,不同情境会引发多样化的情绪反应。尤其是负面情绪状态的普遍存在与心理健康不良结果密切相关,这要求我们对其发生机制及对个体的影响进行全面分析。本文提出一个名为DepressionEmo的新数据集,该数据集包含6037条Reddit用户长帖,旨在识别与抑郁症相关的8种情绪。数据集通过预训练模型零样本分类结果的多数投票生成,并由标注员和ChatGPT进行质量验证,标注员间达到了可接受的评分者信度。我们对情绪间的相关性、情绪随时间分布规律及语言特征进行了分析。此外,我们提供了两类文本分类方法:机器学习方法(如SVM、XGBoost、Light GBM)和深度学习方法(如BERT、GAN-BERT、BART)。预训练BART模型bart-base取得了0.76的最高宏观F1值,显著优于其他评估方法。在所有情绪类别中,自杀意念的宏观F1值最高,表明本数据集在通过文本分析识别抑郁症患者情绪方面具有独特价值。该数据集已公开于:https://github.com/abuBakarSiddiqurRahman/DepressionEmo。