While there has been significant progress in ASR, African-accented clinical ASR has been understudied due to a lack of training datasets. Building robust ASR systems in this domain requires large amounts of annotated or labeled data, for a wide variety of linguistically and morphologically rich accents, which are expensive to create. Our study aims to address this problem by reducing annotation expenses through informative uncertainty-based data selection. We show that incorporating epistemic uncertainty into our adaptation rounds outperforms several baseline results, established using state-of-the-art (SOTA) ASR models, while reducing the required amount of labeled data, and hence reducing annotation costs. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating the viability of our approach for building generalizable ASR models in the context of accented African clinical ASR, where training datasets are predominantly scarce.
翻译:尽管自动语音识别(ASR)领域已取得显著进展,但由于缺乏训练数据集,针对非洲口音的临床ASR研究仍相对不足。在该领域构建鲁棒ASR系统需大量涵盖各类语言及形态丰富口音的标注数据,而此类数据的创建成本高昂。本研究旨在通过基于信息不确定性的数据选择策略降低标注成本以解决该问题。研究表明,在适配轮次中引入认知不确定性,可使模型性能优于采用最先进ASR模型建立的多个基线结果,同时减少所需标注数据量,从而降低标注成本。本方法还显著提升了低资源口音的分布外泛化能力,验证了在训练数据集极度匮乏的非洲口音临床ASR场景中构建可泛化ASR模型的可行性。