Depression manifests through a diverse set of symptoms such as sleep disturbance, loss of interest, and concentration difficulties. However, most existing works treat depression prediction either as a binary label or an overall severity score without explicitly modeling symptom-specific information. This limits their ability to provide symptom-level analysis relevant to clinical screening. To address this, we propose a symptom-specific and clinically inspired framework for depression severity estimation from speech. Our approach uses a symptom-guided cross-attention mechanism that aligns PHQ-8 questionnaire items with emotion-aware speech representations to identify which segments of a participant's speech are more important to each symptom. To account for differences in how symptoms are expressed over time, we introduce a learnable symptom-specific parameter that adaptively controls the sharpness of attention distributions. Our results on EDAIC, a standard clinical-style dataset, demonstrate improved performance outperforming prior works. Further, analyzing the attention distributions showed that higher attention is assigned to utterances containing cues related to multiple depressive symptoms, highlighting the interpretability of our approach. These findings outline the importance of symptom-guided and emotion-aware modeling for speech-based depression screening.
翻译:抑郁症通过睡眠障碍、兴趣丧失和注意力难以集中等多种症状表现出来。然而,现有研究大多将抑郁症预测视为二元标签或整体严重程度评分,未能显式建模症状特异性信息。这限制了其提供与临床筛查相关的症状层面分析的能力。为此,我们提出一种症状特异性且临床启发的框架,用于从语音中估计抑郁严重程度。我们的方法采用症状引导的交叉注意力机制,将PHQ-8问卷条目与情感感知语音表征进行对齐,以识别参与者语音中哪些片段对特定症状更为重要。考虑到症状随时间表达方式的差异,我们引入可学习的症状特异性参数,自适应控制注意力分布的锐度。在标准临床风格数据集EDAIC上的实验结果表明,我们的方法性能优于先前工作。进一步分析注意力分布发现,较高注意力被分配给包含与多种抑郁症状相关线索的话语,这凸显了我们方法的可解释性。这些发现阐明了症状引导和情感感知建模在基于语音的抑郁症筛查中的重要性。