Data-aware post-training quantization (PTQ) minimizes a per-token reconstruction loss on a small calibration corpus, implicitly weighting positions by their empirical frequency. For \textbf{A}utomatic \textbf{S}peech \textbf{R}ecognition (ASR), this misaligns with tail-sensitive risk: names, numerals, and domain-specific words receive proportionally little calibration mass. We propose \textbf{Tail-Aware Reconstruction Quantization} (\TARQ), a label-free PTQ framework that shifts calibration toward the lexical tail via \textbf{\rareBAL}, a closed-form per-Linear-layer rule equalizing common/tail mass, paired with a metric-consistent residual correction. \TARQ\ requires no entity labels, no curated calibration set, no validation decoding, and no additional training. Across eight ASR backbones and six datasets at W4G128, \TARQ\ improves mean rare-\textbf{W}ord \textbf{E}rror \textbf{R}ate (rare-WER) without an aggregate-WER regression, achieves the lowest cross-corpus rare-WER swing among compared methods, and transfers to entity-rich benchmarks (ProfASR, ContextASR-Speech-En) without entity supervision.
翻译:数据感知的后训练量化通过在小型校准语料库上最小化逐词元的重建损失,隐式地按经验频率对位置赋予权重。对于自动语音识别而言,这种策略与尾部敏感风险不一致:姓名、数字和领域特定词获得的校准质量比例不足。我们提出无标签后训练量化框架——尾部感知重建量化,通过封闭形式的逐线性层规则<rareBAL>均衡常见/尾部质量,并辅以度量一致性残差修正,将校准偏移至词汇尾部。TARQ不依赖实体标签、定制校准集、验证解码或额外训练。在八个ASR骨干网络和六个数据集的W4G128设置下,TARQ在保持总体词错误率不退化的情况下降低了平均罕见词错误率,获得了对比方法中最低的跨语料库罕见词错误率波动,且无需实体监督即可迁移至实体丰富的基准数据集(ProfASR、ContextASR-Speech-En)。