Early detection of Alzheimer's Dementia (AD) and Mild Cognitive Impairment (MCI) is critical for timely intervention, yet current diagnostic approaches remain resource-intensive and invasive. Speech, encompassing both acoustic and linguistic dimensions, offers a promising non-invasive biomarker for cognitive decline. In this study, we present a machine learning framework for the PROCESS Challenge, leveraging both audio embeddings and linguistic features derived from spontaneous speech recordings. Audio representations were extracted using Whisper embeddings from the Cookie Theft description task, while linguistic features-spanning pronoun usage, syntactic complexity, filler words, and clause structure-were obtained from transcriptions across Semantic Fluency, Phonemic Fluency, and Cookie Theft picture description. Classification models aimed to distinguish between Healthy Controls (HC), MCI, and AD participants, while regression models predicted Mini-Mental State Examination (MMSE) scores. Results demonstrated that voted ensemble models trained on concatenated linguistic features achieved the best classification performance (F1 = 0.497), while Whisper embedding-based ensemble regressors yielded the lowest MMSE prediction error (RMSE = 2.843). Comparative evaluation within the PROCESS Challenge placed our models among the top submissions in regression task, and mid-range for classification, highlighting the complementary strengths of linguistic and audio embeddings. These findings reinforce the potential of multimodal speech-based approaches for scalable, non-invasive cognitive assessment and underline the importance of integrating task-specific linguistic and acoustic markers in dementia detection.
翻译:阿尔茨海默病(AD)与轻度认知障碍(MCI)的早期检测对于及时干预至关重要,然而当前的诊断方法仍存在资源密集且具有侵入性的局限。涵盖声学与语言维度的语音,为认知衰退提供了一种极具前景的非侵入性生物标志物。本研究针对PROCESS挑战赛提出一个机器学习框架,该框架同时利用了从自发语音录音中提取的音频嵌入和语言特征。音频表征通过Whisper嵌入从“饼干盗窃”图片描述任务中提取;而语言特征——涵盖代词使用、句法复杂性、填充词以及从句结构——则从语义流畅性、音位流畅性及“饼干盗窃”图片描述的转录文本中获取。分类模型旨在区分健康对照组(HC)、MCI与AD参与者,回归模型则用于预测简易精神状态检查量表(MMSE)得分。结果表明,基于拼接语言特征训练的投票集成模型取得了最佳分类性能(F1 = 0.497),而基于Whisper嵌入的集成回归器则实现了最低的MMSE预测误差(RMSE = 2.843)。在PROCESS挑战赛内的比较评估显示,我们的模型在回归任务中位列顶级提交之列,在分类任务中处于中游水平,这凸显了语言与音频嵌入的互补优势。这些发现进一步证实了基于多模态语音的方法在可扩展、非侵入性认知评估方面的潜力,并强调了在痴呆症检测中整合任务特异性语言与声学标记的重要性。