Current automatic depression detection systems provide predictions directly without relying on the individual symptoms/items of depression as denoted in the clinical depression rating scales. In contrast, clinicians assess each item in the depression rating scale in a clinical setting, thus implicitly providing a more detailed rationale for a depression diagnosis. In this work, we make a first step towards using the acoustic features of speech to predict individual items of the depression rating scale before obtaining the final depression prediction. For this, we use convolutional (CNN) and recurrent (long short-term memory (LSTM)) neural networks. We consider different approaches to learning the temporal context of speech. Further, we analyze two variants of voting schemes for individual item prediction and depression detection. We also include an animated visualization that shows an example of item prediction over time as the speech progresses.
翻译:当前自动抑郁检测系统直接提供预测结果,而不依赖于临床抑郁评定量表中规定的具体症状条目。相比之下,临床医生在诊疗环境中会逐项评估抑郁量表中的每个条目,从而为抑郁诊断提供更详细的隐性依据。本研究首次尝试在获得最终抑郁预测前,利用语音声学特征预测抑郁评定量表中的单项条目。为此,我们采用卷积神经网络(CNN)与循环神经网络(长短期记忆网络(LSTM))。我们探讨了学习语音时序上下文的不同方法,并分析了单项条目预测与抑郁检测的两种投票方案变体。此外,我们引入动态可视化展示,呈现语音进程中随时间变化的条目预测示例。