As edge-based automatic speech recognition (ASR) technologies become increasingly prevalent for the development of intelligent and personalized assistants, three important challenges must be addressed for these resource-constrained ASR models, i.e., adaptivity, incrementality, and inclusivity. We propose a novel ASR framework, PI-Whisper, in this work and show how it can improve an ASR's recognition capabilities adaptively by identifying different speakers' characteristics in real-time, how such an adaption can be performed incrementally without repetitive retraining, and how it can improve the equity and fairness for diverse speaker groups. More impressively, our proposed PI-Whisper framework attains all of these nice properties while still achieving state-of-the-art accuracy with up to 13.7% reduction of the word error rate (WER) with linear scalability with respect to computing resources.
翻译:随着基于边缘计算的自动语音识别(ASR)技术在开发智能个性化助手方面日益普及,这些资源受限的ASR模型必须应对三个重要挑战:适应性、增量性与包容性。本文提出一种新型ASR框架——PI-Whisper,并论证其如何通过实时识别不同说话人的特征来自适应提升ASR识别能力,如何以增量方式实现这种适配而无需重复训练,以及如何提升对多样化说话人群体的公平性与公正性。更引人注目的是,我们提出的PI-Whisper框架在保持线性计算资源扩展性的同时,实现了高达13.7%的词错误率(WER)降低,在获得所有上述优良特性的同时仍达到了最先进的识别准确率。