Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.
翻译:量化神经网络可视为一系列噪声信道的级联,其中每层的舍入操作会随位宽缩减而降低信道容量;浮点检查点设定了最大输入速率。我们通过将微调过程建模为平滑约束优化问题,追踪平均位宽下降时的容量动态变化,并识别由此产生的量化瓶颈。该方法采用具有可学习位宽、噪声尺度与裁剪边界的全可微直通估计器,并通过外部点惩罚函数强制执行目标位宽;温和的度量平滑(通过蒸馏实现)能稳定训练过程。尽管方法简洁,该方案在极端W1A1设置下仍能保持竞争力的精度,同时保留了直通估计器的高效性。