Quantization scale and bit-width are the most important parameters when considering how to quantize a neural network. Prior work focuses on optimizing quantization scales in a global manner through gradient methods (gradient descent \& Hessian analysis). Yet, when applying perturbations to quantization scales, we observe a very jagged, highly non-smooth test loss landscape. In fact, small perturbations in quantization scale can greatly affect accuracy, yielding a $0.5-0.8\%$ accuracy boost in 4-bit quantized vision transformers (ViTs). In this regime, gradient methods break down, since they cannot reliably reach local minima. In our work, dubbed Evol-Q, we use evolutionary search to effectively traverse the non-smooth landscape. Additionally, we propose using an infoNCE loss, which not only helps combat overfitting on the small calibration dataset ($1,000$ images) but also makes traversing such a highly non-smooth surface easier. Evol-Q improves the top-1 accuracy of a fully quantized ViT-Base by $10.30\%$, $0.78\%$, and $0.15\%$ for $3$-bit, $4$-bit, and $8$-bit weight quantization levels. Extensive experiments on a variety of CNN and ViT architectures further demonstrate its robustness in extreme quantization scenarios. Our code is available at https://github.com/enyac-group/evol-q
翻译:量化尺度和位宽是神经网络量化的最关键参数。现有工作主要通过梯度方法(梯度下降与黑塞矩阵分析)以全局方式优化量化尺度。然而,当对量化尺度施加扰动时,我们观察到测试损失景观呈现高度锯齿状且非平滑的特性。事实上,量化尺度的微小扰动即可显著影响精度,在4比特量化的视觉Transformer(ViT)中可实现0.5-0.8%的精度提升。在此类场景中,梯度方法因无法可靠抵达局部最小值而失效。我们提出的Evol-Q方法采用进化搜索有效遍历非平滑损失景观。此外,我们引入infoNCE损失函数,既能缓解小规模校准数据集(1000张图像)上的过拟合问题,又能简化高度非平滑表面的遍历过程。Evol-Q在全量化ViT-Base上,针对3比特、4比特和8比特权重量化层级分别实现了10.30%、0.78%和0.15%的Top-1准确率提升。在多种CNN与ViT架构上的大量实验进一步证明了其在极端量化场景中的鲁棒性。我们的代码已开源至https://github.com/enyac-group/evol-q。