The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust language modeling.
翻译:使用冻结预训练语言模型(PLM)的低秩适配(LoRA)方法已成为内存受限硬件上主流且资源高效的建模手段。本研究首先探索如何通过引入多种LoRA训练策略提升模型性能,在公开的Librispeech数据集上实现相对词错误率降低3.50%,在消息领域的内部数据集上实现3.67%的降低。为深入刻画基于LoRA的二遍语音识别模型的稳定性,我们考察了模型对输入扰动的鲁棒性。这些扰动基于同音替换以及一种名为N-best扰动重评分鲁棒性(NPRR)的新指标,两者均旨在衡量重评分模型性能的相对退化程度。实验结果表明:与全调优模型及标准LoRA调优基线相比,虽然LoRA的高级变体(如动态分配秩的LoRA)在1-best扰动下导致性能退化,但能缓解N-best扰动下的退化现象。这一发现表明,在使用基于LoRA的适配方法实现计算成本节约与鲁棒语言建模时,需要进行全面权衡选择。