Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker adaptive training (SAT) with model quantisation to improve the performance of heavily compressed models. Specifically, PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size and 1% additional speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and Conformer-based attention-based encoder-decoder ASR models respectively, comparing to the original full precision models.
翻译:近年来,端到端自动语音识别模型规模日益增大,在资源受限设备上的部署面临严峻挑战。模型量化是一种有效解决方案,但有时会导致词错误率上升。本文提出一种针对量化模型的个性化新策略,该策略将说话人自适应训练与模型量化相结合,以提升重度压缩模型的性能。具体而言,该方法采用4位NormalFloat量化方式实现模型压缩,并通过低秩自适应技术完成说话人自适应训练。在LibriSpeech和TED-LIUM 3语料库上的实验表明:当模型规模缩小7倍且仅增加1%的说话人专属参数时,基于Whisper和Conformer的注意力编码-解码量化ASR模型相较于原始全精度模型,分别实现了15.1%和23.3%的相对词错误率降低。