The growing prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods that are applicalbe across languages. However, most existing approaches are either limited to a single language or fail to capture language-specific factors shaping intelligibility. We present a multilingual phoneme-production assessment framework that integrates universal phone recognition with language-specific phoneme interpretation using contrastive phonological feature distances for phone-to-phoneme mapping and sequence alignment. The framework yields three metrics: phoneme error rate (PER), phonological feature error rate (PFER), and a newly proposed alignment-free measure, phoneme coverage (PhonCov). Analysis on English, Spanish, Italian, and Tamil show that PER benefits from the combination of mapping and alignment, PFER from alignment alone, and PhonCov from mapping. Further analyses demonstrate that the proposed framework captures clinically meaningful patterns of intelligibility degradation consistent with established observations of dysarthric speech.
翻译:与构音障碍相关的神经系统疾病日益普遍,这促使需要开发适用于多种语言的自动化可懂度评估方法。然而,现有方法大多局限于单一语言,或未能捕捉影响可懂度的语言特异性因素。我们提出了一种多语言音素产出评估框架,该框架通过对比性音系特征距离进行音素到音位的映射与序列对齐,将通用音素识别与语言特异性音位解释相结合。该框架产生三个度量指标:音位错误率(PER)、音系特征错误率(PFER)以及一个新提出的无需对齐的度量——音位覆盖率(PhonCov)。对英语、西班牙语、意大利语和泰米尔语的分析表明,PER受益于映射与对齐的结合,PFER仅受益于对齐,而PhonCov受益于映射。进一步的分析证明,所提出的框架捕捉到了具有临床意义的可懂度下降模式,这些模式与对构音障碍语音的既定观察结果一致。