Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better utilize the small-sized datasets that are frequently encountered in this task. Our study demonstrates notable performance improvements by utilizing SSL embeddings compared to conventional speech features. We compare various types of SSL pretrained models to elucidate the type of speech information (semantic, speaker, or prosodic) that contributes the most in identifying different symptoms. Additionally, we evaluate the impact of combining multiple SSL embeddings on performance. Furthermore, we show the significance of multi-task learning for identifying depressive symptoms effectively.
翻译:抑郁症作为一种影响全球数百万人的常见心理健康障碍,需要可靠的评估系统。与以往仅关注抑郁症检测或严重程度预测的研究不同,我们的工作通过语音输入识别抑郁症的个体症状并预测其严重程度。我们利用基于自监督学习(SSL)的语音模型,以更好地利用该任务中常见的小规模数据集。研究表明,与传统语音特征相比,使用SSL嵌入能带来显著的性能提升。我们比较了多种类型的SSL预训练模型,以阐明在识别不同症状时贡献最大的语音信息类型(语义、说话者或韵律特征)。此外,我们评估了组合多种SSL嵌入对性能的影响。进一步地,我们证明了多任务学习对有效识别抑郁症状的重要性。