Depression, a common mental disorder, significantly influences individuals and imposes considerable societal impacts. The complexity and heterogeneity of the disorder necessitate prompt and effective detection, which nonetheless, poses a difficult challenge. This situation highlights an urgent requirement for improved detection methods. Exploiting auditory data through advanced machine learning paradigms presents promising research directions. Yet, existing techniques mainly rely on single-dimensional feature models, potentially neglecting the abundance of information hidden in various speech characteristics. To rectify this, we present the novel Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features. The effectiveness of our approach is confirmed via extensive validation on two clinical speech databases, CNRAC and CS-NRAC, thereby outperforming previous methods in depression detection and subtype classification. Further in-depth analysis confirms the key role of each feature and highlights the importance of MFCCrelated features in speech-based depression detection.
翻译:抑郁症作为一种常见的精神障碍,对个体产生显著影响并给社会带来沉重负担。该疾病的复杂性与异质性要求及时有效的检测,但这仍是一项艰巨挑战。这一现状凸显了对更优检测方法的迫切需求。通过先进机器学习范式挖掘听觉数据为研究提供了有前景的方向。然而现有技术主要依赖单维度特征模型,可能忽视了各种语音特征中隐藏的丰富信息。为解决此问题,我们提出基于注意力的新型声学特征融合网络(ABAFnet)用于抑郁症检测。ABAFnet将四种不同声学特征整合为综合深度学习模型,从而有效融合多层级特征。我们提出了一种新型的后期融合权重调节模块,通过高效综合这些特征来提升性能。在两个临床语音数据库CNRAC和CS-NRAC上的广泛验证证实了本方法的有效性,在抑郁症检测及亚型分类任务中均优于既有方法。进一步深入分析证实了各特征的关键作用,并凸显了MFCC相关特征在基于语音的抑郁症检测中的重要性。