This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.
翻译:本文提出了一种全自动方法,用于从语音记录中识别言语异常,以辅助言语障碍评估。通过结合连接主义时序分类(CTC)与编码器-解码器自动语音识别模型,我们生成了丰富的声学特征与清晰转录文本。随后应用多种自然语言处理方法从这些转录文本中提取特征,构建健康言语的原型。基于这些原型的简单距离度量作为标准机器学习分类器的输入特征,能够以人类水平准确率区分失语症患者与健康对照组的语音记录。此外,对最常见的失语症类型分类准确率可达90%。该流程可直接应用于其他疾病及语言,展现出稳健提取诊断性言语生物标志物的潜力。