Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionnaires, including variants of the Patient Health Questionnaire (PHQ) by Clinicians and mental health professionals. This approach places significant reliance on the experience and judgment of trained physicians, making the diagnosis susceptible to personal biases. Given that the underlying mechanisms causing depression are still being actively researched, physicians often face challenges in diagnosing and treating the condition, particularly in its early stages of clinical presentation. Recently, significant strides have been made in Artificial neural computing to solve problems involving text, image, and speech in various domains. Our analysis has aimed to leverage these state-of-the-art (SOTA) models in our experiments to achieve optimal outcomes leveraging multiple modalities. The experiments were performed on the Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge. The proposed solutions demonstrate better results achieved by Proprietary and Open-source Large Language Models (LLMs), which achieved a Root Mean Square Error (RMSE) score of 3.98 on Textual Modality, beating the AVEC 2019 challenge baseline results and current SOTA regression analysis architectures. Additionally, the proposed solution achieved an accuracy of 71.43% in the classification task. The paper also includes a novel audio-visual multi-modal network that predicts PHQ-8 scores with an RMSE of 6.51.

翻译：抑郁症已被证实是一项重大的公共卫生问题，深刻影响个体的心理健康。若未能得到诊断，抑郁症可能导致严重的健康问题，这些健康问题可能表现为躯体症状，甚至引发自杀。通常，抑郁症或任何其他精神障碍的诊断涉及由临床医生和心理健康专业人员进行的半结构化访谈以及辅助性问卷，包括患者健康问卷（PHQ）的多种变体。这种方法在很大程度上依赖于训练有素的医生的经验和判断，使得诊断容易受到个人偏见的影响。鉴于导致抑郁症的潜在机制仍在积极研究中，医生在诊断和治疗该病症时常常面临挑战，尤其是在其临床表现的早期阶段。近年来，人工神经网络计算在解决涉及文本、图像和语音的跨领域问题方面取得了重大进展。我们的分析旨在利用这些最先进的模型进行实验，通过整合多模态信息以实现最优结果。实验在2019年音频/视觉情感挑战赛（AVEC）中提出的扩展性痛苦分析访谈语料库Wizard of Oz数据集（E-DAIC）上进行。所提出的解决方案展示了专有和开源大型语言模型（LLMs）取得的更优结果，其在文本模态上获得了3.98的均方根误差（RMSE）分数，超越了AVEC 2019挑战赛的基线结果和当前最先进的回归分析架构。此外，所提出的解决方案在分类任务中达到了71.43%的准确率。本文还提出了一种新颖的视听多模态网络，该网络以6.51的RMSE预测PHQ-8分数。