Quantization Error Reconstruction (QER) reduces accuracy loss in Post-Training Quantization (PTQ) by approximating weights as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$, using a rank-$r$ correction to reconstruct quantization error. Prior methods devote the full rank budget to error reconstruction, which is suboptimal when $\mathbf{W}$ has intrinsic low-rank structure and quantization corrupts dominant directions. We propose Structured Residual Reconstruction (SRR), a rank-allocation framework that preserves the top-$k$ singular subspace of the activation-scaled weight before quantization, quantizes only the residual, and uses the remaining rank $r-k$ for error reconstruction. We derive a theory-guided criterion for selecting $k$ by balancing quantization-exposed energy and unrecoverable error under rank constraints. We further show that resulting $\mathbf{Q} + \mathbf{L}\mathbf{R}$ parameterization naturally supports Quantized Parameter-Efficient Fine-Tuning (QPEFT), and stabilizes fine-tuning via gradient scaling along preserved directions. Experiments demonstrate consistent perplexity reductions across diverse models and quantization settings in PTQ, along with a 5.9 percentage-point average gain on GLUE under 2-bit QPEFT.
翻译:量化误差重构(QER)通过将权重近似为 $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$,利用秩为 $r$ 的校正项来重构量化误差,从而减少训练后量化(PTQ)中的精度损失。现有方法将全部秩预算用于误差重构,这在 $\mathbf{W}$ 具有内在低秩结构且量化破坏了其主导方向时并非最优。我们提出了结构化残差重构(SRR),一种秩分配框架,该框架在量化前保留激活缩放权重的顶部 $k$ 奇异子空间,仅量化残差,并使用剩余的秩 $r-k$ 进行误差重构。我们推导了一个理论指导的准则,用于在秩约束下通过平衡量化暴露能量与不可恢复误差来选择 $k$。我们进一步表明,由此产生的 $\mathbf{Q} + \mathbf{L}\mathbf{R}$ 参数化自然地支持量化参数高效微调(QPEFT),并通过沿保留方向的梯度缩放来稳定微调过程。实验表明,该方法在PTQ的各种模型和量化设置下均能持续降低困惑度,并在2位QPEFT下于GLUE基准上平均获得5.9个百分点的性能提升。