While there has been significant progress in ASR, African-accented clinical ASR has been understudied due to a lack of training datasets. Building robust ASR systems in this domain requires large amounts of annotated or labeled data, for a wide variety of linguistically and morphologically rich accents, which are expensive to create. Our study aims to address this problem by reducing annotation expenses through informative uncertainty-based data selection. We show that incorporating epistemic uncertainty into our adaptation rounds outperforms several baseline results, established using state-of-the-art (SOTA) ASR models, while reducing the required amount of labeled data, and hence reducing annotation costs. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating the viability of our approach for building generalizable ASR models in the context of accented African clinical ASR, where training datasets are predominantly scarce.
翻译:尽管自动语音识别技术已取得显著进展,但由于训练数据集匮乏,针对非洲口音临床语音的自动语音识别研究尚不充分。在该领域构建鲁棒的自动语音识别系统需要大量涵盖多种语言形态丰富口音的标注数据,而这类数据的构建成本高昂。本研究旨在通过基于信息不确定性的数据选择方法降低标注成本来解决该问题。研究表明,在自适应轮次中纳入认知不确定性,可在减少所需标注数据量及标注成本的同时,超越采用当前最优自动语音识别模型建立的多个基线结果。该方法还改善了低资源口音数据的分布外泛化性能,证明了在训练数据集极度稀缺的非洲口音临床语音环境中构建泛化性自动语音识别模型的可行性。