Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based attention-based encoder-decoder ASR model respectively, comparing to the original full precision models.
翻译:专家混合模型已在众多任务中取得优异性能。然而,传统专家混合模型通常参数量巨大,难以部署在资源受限的边缘设备上。本文提出一种新颖的说话人自适应LoRA专家混合方法,该方法采用低秩自适应模块作为专家,以显著减少专家混合模型中可训练参数的数量。具体而言,SAML被应用于量化且个性化的端到端自动语音识别模型,该方法结合测试时说话人自适应技术,显著提升了强压缩模型在说话人特定场景下的识别性能。我们在LibriSpeech和TED-LIUM 3数据集上进行了实验验证。实验结果表明:相较于原始全精度模型,量化后的Whisper模型与基于Conformer的注意力编码器-解码器语音识别模型在尺寸压缩7倍的同时,分别实现了29.1%与31.1%的相对词错误率下降。