Deploying Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires aggressive low-bit weight quantization. Layer-wise post-training quantization is practical and effective, but it suffers from cross-layer error accumulation. Existing compensation methods typically use a single global strength for all layers, which is ill-suited to encoder-decoder ASR models whose acoustic encoder and linguistic decoder exhibit markedly different sensitivities to quantization noise. We propose FADE, a diagnostic-driven framework that assigns each layer an adaptive compensation coefficient by combining two complementary signals: an intrinsic vulnerability score from weight geometry and a calibration reliability score from the data-driven solution. The resulting layer-wise coefficient balances local quantization fidelity against cross-layer error correction, enabling tailored compensation without retraining or hyperparameter search. Experiments on Whisper, Moonshine, and Qwen3-ASR across four benchmarks show that FADE consistently improves mean Word Error Rate over strong baselines at both 3- and 4-bit precision while substantially reducing run-to-run variance.
翻译:在内存受限的边缘设备上部署自动语音识别(ASR)模型需要采用激进的低位权重量化。逐层训练后量化方法虽实用有效,但存在跨层误差累积问题。现有补偿方法通常对所有层使用统一的全局强度,这难以适配声学编码器与语言解码器对量化噪声敏感度差异显著的编码器-解码器ASR模型。本文提出FADE框架,这是一种诊断驱动的方法,通过融合两种互补信号为每层分配自适应补偿系数:基于权重几何特征的固有脆弱性评分和基于数据驱动方案的校准可靠性评分。由此产生的逐层系数能在局部量化保真度与跨层误差校正之间取得平衡,无需重新训练或超参数搜索即可实现定制化补偿。在Whisper、Moonshine和Qwen3-ASR模型上的四个基准测试实验表明,FADE在3位和4位精度下均能持续改善平均词错误率,并显著降低运行间方差。