As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across subgroups. We believe that SRB will facilitate future research towards robust ASR models, by making it easier to conduct comprehensive and comparable robustness evaluations.
翻译:随着自动语音识别(ASR)模型日益普及,确保其在物理与数字世界中的各类干扰下仍能做出可靠预测至关重要。我们提出Speech Robust Bench(SRB)——一个用于评估ASR模型对多种干扰鲁棒性的综合性基准。SRB包含69种输入扰动,旨在模拟ASR模型在物理与数字环境中可能遭遇的各类干扰。我们利用SRB评估了多个先进ASR模型的鲁棒性,发现模型规模以及离散表征、自训练等特定建模选择有利于提升鲁棒性。我们进一步分析了ASR模型在英语与西班牙语使用者、男性和女性等不同人口统计子群体数据上的鲁棒性差异,观察到模型在不同子群体间存在显著鲁棒性不均衡现象。我们相信SRB将通过简化全面且可比的鲁棒性评估流程,推动未来面向鲁棒ASR模型的研究。