Running Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires efficient compression. While layer-wise post-training quantization is effective, it suffers from error accumulation, especially in encoder-decoder architectures. Existing solutions like Quantization Error Propagation (QEP) are suboptimal for ASR due to the model's heterogeneity, processing acoustic features in the encoder while generating text in the decoder. To address this, we propose Fine-grained Alpha for Dynamic Quantization Error Propagation (FADE), which adaptively controls the trade-off between cross-layer error correction and local quantization. Experiments show that FADE significantly improves stability by reducing performance variance across runs, while simultaneously surpassing baselines in mean WER.
翻译:在内存受限的边缘设备上运行自动语音识别(ASR)模型需要高效的压缩技术。虽然分层后训练量化方法行之有效,但其存在误差累积问题,在编码器-解码器架构中尤为显著。现有解决方案(如量化误差传播)因ASR模型的异质性而效果欠佳——编码器处理声学特征而解码器生成文本。为此,我们提出面向动态量化误差传播的细粒度Alpha控制方法(FADE),该方法能自适应地权衡跨层误差校正与局部量化之间的关系。实验表明,FADE通过降低多次运行间的性能方差显著提升了稳定性,同时在平均词错误率指标上全面超越基线模型。