Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.
翻译:后训练量化(PTQ)将训练好的全精度模型转换为低位权重,无需任务级重新训练,而量化感知训练(QAT)则将量化纳入训练循环。尽管PTQ在中等位宽下高效且通常准确,但在激进位宽下可能急剧失效;QAT成本更高,但常能恢复丢失的精度。我们提出了一个统一的几何框架,解释PTQ失效与QAT恢复的机制。我们将全精度训练建模为在较宽的"山谷"内沿低损失"河流"进行:河流的正常邻域形成一个近乎平坦的"盆地",而离开此盆地会导致损失急剧增加。当量化网格与盆地宽度相当时,局部PTQ目标(包括舍入和基于海森矩阵的二阶重建)可能选择盆地外的高损失部署量化点,即使附近存在低损失量化点。在此条件下,基于直通估计器的QAT具有有用的偏差:它在部署的量化权重处评估梯度,同时更新潜在全精度权重,导致梯度感知山谷壁并获得向内分量,从而引导后续量化迭代返回盆地。我们通过局部景观模型形式化此机制,构建几何PTQ失效模式,并在局部量化器兼容性假设下证明有限时间QAT恢复。在多神经网络量化方案下的视觉和语言模型实验中,验证了预测的PTQ跨越盆地失效及相应的QAT恢复机制。