We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such as age, sex, and ethnicity), we used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD. Our novel fusion model for PD classification, which aligns different speech embeddings into a cohesive feature space, demonstrated superior performance over standard concatenation-based fusion models and other baselines (including models built on traditional acoustic features). In a randomized data split configuration, the model achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis confirmed that our model performs equitably across various demographic subgroups in terms of sex, ethnicity, and age, and remains robust regardless of disease duration. Furthermore, our model, when tested on two entirely unseen test datasets collected from clinical settings and from a PD care center, maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the model's robustness and it's potential to enhance accessibility and health equity in real-world applications.
翻译:我们提出一个框架,通过使用网络应用程序从多样化的录音设置和环境(包括参与者家中)收集的英语全字母句语音来识别帕金森病(PD)。我们的数据集包含一个全球队列,共1306名参与者,其中392名被诊断为PD。利用该数据集在人口统计学属性(如年龄、性别和种族)上的多样性,我们使用了源自半监督模型(如Wav2Vec 2.0、WavLM和ImageBind)的深度学习嵌入,这些嵌入代表了与PD相关的语音动态特征。我们用于PD分类的新型融合模型将不同的语音嵌入对齐到一个连贯的特征空间中,其性能优于标准的基于拼接的融合模型及其他基线模型(包括基于传统声学特征构建的模型)。在随机数据划分配置下,该模型的受试者工作特征曲线下面积(AUROC)达到88.94%,准确率达到85.65%。严格的统计分析证实,我们的模型在不同人口统计学亚组(包括性别、种族和年龄)上表现公平,并且无论疾病持续时间长短都保持稳健。此外,当在两个完全未见过的测试数据集(分别从临床环境和PD护理中心收集)上进行测试时,我们的模型分别保持了82.12%和78.44%的AUROC分数。这证实了模型的稳健性及其在现实世界应用中提升可及性和健康公平性的潜力。