A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such as age, sex, and ethnicity), we used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD. Our novel fusion model for PD classification, which aligns different speech embeddings into a cohesive feature space, demonstrated superior performance over standard concatenation-based fusion models and other baselines (including models built on traditional acoustic features). In a randomized data split configuration, the model achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis confirmed that our model performs equitably across various demographic subgroups in terms of sex, ethnicity, and age, and remains robust regardless of disease duration. Furthermore, our model, when tested on two entirely unseen test datasets collected from clinical settings and from a PD care center, maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the model's robustness and it's potential to enhance accessibility and health equity in real-world applications.

翻译：我们提出一个框架，通过使用网络应用程序从多样化的录音设置和环境（包括参与者家中）收集的英语全字母句语音来识别帕金森病（PD）。我们的数据集包含一个全球队列，共1306名参与者，其中392名被诊断为PD。利用该数据集在人口统计学属性（如年龄、性别和种族）上的多样性，我们使用了源自半监督模型（如Wav2Vec 2.0、WavLM和ImageBind）的深度学习嵌入，这些嵌入代表了与PD相关的语音动态特征。我们用于PD分类的新型融合模型将不同的语音嵌入对齐到一个连贯的特征空间中，其性能优于标准的基于拼接的融合模型及其他基线模型（包括基于传统声学特征构建的模型）。在随机数据划分配置下，该模型的受试者工作特征曲线下面积（AUROC）达到88.94%，准确率达到85.65%。严格的统计分析证实，我们的模型在不同人口统计学亚组（包括性别、种族和年龄）上表现公平，并且无论疾病持续时间长短都保持稳健。此外，当在两个完全未见过的测试数据集（分别从临床环境和PD护理中心收集）上进行测试时，我们的模型分别保持了82.12%和78.44%的AUROC分数。这证实了模型的稳健性及其在现实世界应用中提升可及性和健康公平性的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日