Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers.
翻译:自动语音识别(ASR)的最新进展催生了大型人工智能模型,但这些模型难以部署到移动设备中。模型量化能够有效生成通用的压缩模型,然而这类模型可能仅适用于受限的特定子领域。我们证明,ASR模型可以在量化过程中实现个性化,且仅需依赖目标领域的一小部分无标签样本。为此,我们提出myQASR——一种混合精度量化方法,该方法能为不同用户在任意内存约束下生成定制化量化方案,且无需微调。myQASR通过分析全精度激活值自动评估网络层的量化敏感性,从而能为任何预设内存预算生成个性化的混合精度量化方案。针对大规模ASR模型的实验结果表明,myQASR能够提升特定性别、语言和说话者的性能表现。