ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.
翻译:自动语音识别系统通常为口语“标准”构建,其性能在处理非标准方言/变体时会下降。这对爱尔兰语等语言来说是一个问题,因为该语言不存在单一的口语标准,而是有三种主要方言:阿尔斯特方言、康诺特方言和芒斯特方言。为量化说话者方言对识别性能的影响,本研究训练了12个ASR系统,首先使用基线方言平衡训练语料库,然后使用经修改的基线语料库版本(其中减去或添加了特定方言的材料)。结果表明,方言平衡语料库并未在所有方言中产生类似的性能表现:阿尔斯特方言始终表现不佳,而芒斯特方言的词错误率最低。康诺特方言与芒斯特方言之间存在密切关系,但这种关系并不对称。这些结果将为未来的语料库收集和系统构建策略提供指导,以优化跨方言性能的公平性。