Voice Biomarkers for Depression and Anxiety

Current approaches to detecting depression and anxiety from speech primarily rely on machine learning techniques that utilize hand-engineered paralinguistic features and related acoustic descriptors derived from time- and frequency-domain representations of speech signals. Applying deep learning methods directly to raw speech signals has the potential to produce biomarker representations with substantially greater predictive power. However, these approaches typically require large volumes of carefully annotated data to learn robust and clinically meaningful representations of the underlying biomarkers. In this paper, we describe our efforts toward developing a deep learning model trained on a large-scale proprietary dataset comprising ~65,000 utterances collected from more than 23,000 subjects representative of relevant United States demographics. We present the techniques employed and analyze their impact on model performance. Our results demonstrate that the proposed models can extract content-agnostic biomarker information, which, when combined with lexical features extracted from audio, yields improved predictive performance in production settings. Our models are evaluated on ~5000 unique subjects and achieve performance of 71% in terms of sensitivity and specificity. To foster further research in mental health assessment from speech, we release the best-performing model described in this paper on HuggingFace.

翻译：[翻译摘要] 当前从语音中检测抑郁和焦虑的方法主要依赖机器学习技术，这些技术利用手工工程化的副语言特征及从语音信号的时域和频域表示中导出的相关声学描述符。将深度学习方法直接应用于原始语音信号，有可能生成预测能力更强的生物标志物表示。然而，这些方法通常需要大量精心标注的数据来学习鲁棒且具有临床意义的底层生物标志物表示。本文描述了我们在开发深度学习模型方面的工作，该模型基于大规模专有数据集进行训练，该数据集包含来自23,000余名受试者的约65,000条语音样本，这些受试者代表了美国相关人口统计学特征。我们介绍了所采用的技术并分析了其对模型性能的影响。结果表明，所提模型能够提取内容无关的生物标志物信息；当这些信息与从音频中提取的词汇特征相结合时，可在实际生产环境中提升预测性能。我们在约5,000名独立受试者上评估模型，实现了71%的灵敏度和特异度性能。为促进语音心理健康评估领域的研究，我们将本文描述的最佳性能模型发布在HuggingFace平台上。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

国家标准《人工智能深度学习算法评估》（征求意见稿）

专知会员服务

90+阅读 · 2024年6月17日

【剑桥大学博士论文】主动学习和半监督学习在语音识别中的应用，238页pdf

专知会员服务

31+阅读 · 2024年4月13日

《用于语音取证和高超音速飞行器应用的机器学习》200页

专知会员服务

20+阅读 · 2024年3月28日

《从生理信号对人类情感状态分类的表格神经网络方法评估》美陆军研究实验室2022最新23页报告

专知会员服务

34+阅读 · 2022年12月3日