Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.
翻译:重度抑郁症(MDD)是一种普遍存在的心理健康疾病,全球影响人数达三亿。本研究提出了一种新颖的、基于BiLSTM的三模态模型级融合架构,用于对临床访谈录音进行抑郁症二分类。该架构融合了梅尔频率倒谱系数、面部动作单元,并采用基于双样本学习的GPT-4模型处理文本数据。这是首次将大语言模型融入多模态架构以完成此任务的研究。该模型在DAIC-WOZ AVEC 2016挑战赛的交叉验证划分及留一受试者交叉验证划分上取得了显著成果,超越了所有基线模型及多个先进模型。在留一受试者测试中,其准确率达到91.01%,F1分数为85.95%,精确率为80%,召回率为92.86%。