Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing depressive states is still relatively untapped. In this paper, we present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection. We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals. Evaluations of the proposed approach on the DAIC-WOZ dataset reveal state-of-the-art results when compared with existing Audio-Text baselines. In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals.
翻译:抑郁症是全球心理健康领域的重大关切,激发了基于人工智能的检测方法的广泛研究。在众多人工智能技术中,大语言模型因其在心理健康应用中的多功能性而脱颖而出。然而,其主要局限性在于完全依赖文本输入,这限制了其整体能力。此外,利用大语言模型识别和分析抑郁状态的研究仍相对不足。本文提出了一种创新方法,将声学语音信息集成到大语言模型框架中,用于多模态抑郁症检测。我们研究了一种利用声学地标将语音信号集成到大语言模型中的高效方法。通过纳入与发音单词特定发音相关的声学地标,我们的方法为文本转录增加了关键维度。这种集成还揭示了个体独特的语音模式,从而揭示其潜在的心理状态。在DAIC-WOZ数据集上的评估表明,与现有的音频-文本基线方法相比,所提出的方法达到了最先进的结果。此外,该方法不仅对抑郁症检测具有价值,还代表了增强大语言模型理解和处理语音信号能力的新视角。