Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers.
翻译:近期自动语音识别(ASR)领域的进展催生了大型AI模型,但这些模型难以部署到移动设备上。模型量化能有效生成通用的压缩模型,然而此类模型可能仅适用于受限的目标子领域。我们证明,在量化过程中,仅需利用目标域的一小部分无标签样本,即可实现ASR模型的个性化。为此,我们提出myQASR——一种混合精度量化方法,能在无需微调的情况下,针对任意内存需求为不同用户生成定制化量化方案。myQASR通过分析全精度激活值自动评估网络层的量化灵敏度,从而能为任何预设内存预算生成个性化的混合精度量化方案。针对大规模ASR模型的结果表明,myQASR能有效提升特定性别、语言及说话人的性能表现。