Rubrics are a commonly used tool for labeling voice corpora in speech quality assessment, although their application in the context of pathological speech remains relatively limited. In this study, we introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody. The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome, thereby enabling the development of automated assessment systems. To achieve this objective, we utilized the Prautocal corpus. To assess the quality of annotations using our rubric, two experiments were conducted, focusing on phonetics and fluency. For phonetic evaluation, we employed the Goodness of Pronunciation (GoP) metric, utilizing automatic segmentation systems and correlating the results with evaluations conducted by a specialized speech therapist. While the obtained correlation values were not notably high, a positive trend was observed. In terms of fluency assessment, deep learning models like wav2vec were used to extract audio features, and we employed an SVM classifier trained on a corpus focused on identifying fluency issues to categorize Prautocal corpus samples. The outcomes highlight the complexities of evaluating such phenomena, with variability depending on the specific type of disfluency detected.
翻译:评分体系是语音质量评估中常用的语料标注工具,但其在病理语音领域的应用仍相对有限。本研究基于语音质量的多个维度(包括语音学、流畅度和韵律)提出了一种综合评分体系,旨在建立唐氏综合征患者语音错误识别的标准化标准,从而推动自动化评估系统的开发。为此,我们采用了Prautocal语料库。为验证该评分体系的标注质量,我们开展了两个分别针对语音学和流畅度的实验。语音学评估方面,使用自动语音分割系统计算发音优良度(GoP)指标,并将结果与专业言语治疗师的评估进行相关性分析。虽然获得的相关系数并不显著,但呈现正向趋势。流畅度评估方面,采用wav2vec等深度学习模型提取音频特征,并利用基于流畅度问题识别训练的SVM分类器对Prautocal语料库样本进行分类。研究结果揭示了此类现象评估的复杂性,其变异性取决于所检测到的特定不流畅类型。