The growing prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods that are applicalbe across languages. However, most existing approaches are either limited to a single language or fail to capture language-specific factors shaping intelligibility. We present a multilingual phoneme-production assessment framework that integrates universal phone recognition with language-specific phoneme interpretation using contrastive phonological feature distances for phone-to-phoneme mapping and sequence alignment. The framework yields three metrics: phoneme error rate (PER), phonological feature error rate (PFER), and a newly proposed alignment-free measure, phoneme coverage (PhonCov). Analysis on English, Spanish, Italian, and Tamil show that PER benefits from the combination of mapping and alignment, PFER from alignment alone, and PhonCov from mapping. Further analyses demonstrate that the proposed framework captures clinically meaningful patterns of intelligibility degradation consistent with established observations of dysarthric speech.
翻译:随着与构音障碍相关的神经系统疾病日益普遍,迫切需要开发适用于跨语言的自动化可懂度评估方法。然而,现有方法大多局限于单一语言,或未能捕捉影响可懂度的语言特异性因素。本文提出一种多语言音素产出评估框架,该框架通过对比性音系特征距离进行音素到音位的映射与序列对齐,将通用音素识别与语言特异性音位解释相结合。该框架产生三个评估指标:音素错误率、音系特征错误率以及新提出的无需对齐的度量——音位覆盖率。在英语、西班牙语、意大利语和泰米尔语上的分析表明:音素错误率受益于映射与对齐的结合,音系特征错误率仅受益于对齐,而音位覆盖率则受益于映射。进一步分析证明,所提框架能够捕捉具有临床意义的可懂度下降模式,这些模式与已有的构音障碍语音观测结果一致。