Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and $\sigma$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called \genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach.
翻译:零样本量化是一种有前景的方法,用于在数据因成本及隐私相关问题而不可获取时开发轻量级深度神经网络。通过利用FP32预训练模型中批归一化层的学习参数($\mu$和$\sigma$),零样本量化方案专注于生成合成数据。随后,它们从预训练模型(教师)向量化模型(学生)进行知识蒸馏,从而使得量化模型能够通过合成数据集进行优化。然而,迄今为止,零样本量化主要在量化感知训练方法的背景下被讨论,这需要与重新训练同等程度的任务特定损失和长期优化。因此,我们引入了一种用于零样本量化的训练后量化方案,能在数小时内生成高质量的量化网络。此外,我们提出了一个名为Genie的框架,用于生成适合量化的数据。借助Genie合成的数据,我们能够在无真实数据集的情况下生成鲁棒的量化模型,其性能可与少样本量化相媲美。我们还提出了一种训练后量化算法以增强量化模型的性能。通过结合这些方法,我们能够弥合零样本与少样本量化之间的差距,同时相比现有方法显著提升量化性能。换言之,我们获得了一种独特的最先进的零样本量化方法。