Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models

Accents are crucial in human communication as they help us understand others and allow us to communicate intelligibly in a way others understand us. While there has been significant progress in ASR, African-accented ASR has been understudied due to a lack of training datasets which are often expensive to create and demand colossal human labor. Our study aims to address this problem by automating the annotation process and reducing annotation-related expenses through informative uncertainty-based data selection. We propose a new multi-rounds adaptation process that uses epistemic uncertainty and evaluate it across several domains, datasets, and high-performing ASR models. Our results show that our approach leads to a 69.44\% WER improvement while requiring on average 45\% less data than established baselines. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African ASR. Moreover, the results of our active learning experiments, simulating real-world settings, where there are no \textit{gold} transcriptions available, also demonstrate the ability of our approach to favor good quality real-life transcriptions. This indicates that our proposed approach addresses the immediate issue of African-accented ASR and has broader implications for improving ASR systems for other underrepresented and low-resource languages and accents. We open-source the code https://github.com/bonaventuredossou/active_learning_african_asr

翻译：口音在人类交流中至关重要，既帮助我们理解他人，也使我们能够清晰表达并被他人理解。尽管自动语音识别（ASR）技术已取得显著进步，但受限于训练数据集创建成本高昂且需要大量人力，针对非洲口音的ASR研究仍相对不足。本研究旨在通过自动化标注流程并采用基于信息不确定性的数据选择技术来降低标注成本，从而解决这一问题。我们提出了一种基于认知不确定性的多轮自适应方法，并在多个领域、数据集及高性能ASR模型上进行了评估。实验结果表明，该方法在平均减少45%训练数据量的情况下，实现了69.44%的词错误率（WER）改善。对于资源极度匮乏的口音，该方法还显著提升了分布外泛化能力，验证了其在构建面向非洲口音的可泛化ASR模型中的可行性。此外，我们在模拟真实场景（无人工标注转录文本）的主动学习实验中，证明了该方法能够优先选择高质量的真实转录数据。这表明，我们提出的方法不仅解决了非洲口音ASR的当务之急，更对改善其他弱势及低资源语言与口音的ASR系统具有广泛借鉴意义。我们已开源相关代码：https://github.com/bonaventuredossou/active_learning_african_asr