Current model quantization methods have shown their promising capability in reducing storage space and computation complexity. However, due to the diversity of quantization forms supported by different hardware, one limitation of existing solutions is that usually require repeated optimization for different scenarios. How to construct a model with flexible quantization forms has been less studied. In this paper, we explore a one-shot network quantization regime, named Elastic Quantization Neural Networks (EQ-Net), which aims to train a robust weight-sharing quantization supernet. First of all, we propose an elastic quantization space (including elastic bit-width, granularity, and symmetry) to adapt to various mainstream quantitative forms. Secondly, we propose the Weight Distribution Regularization Loss (WDR-Loss) and Group Progressive Guidance Loss (GPG-Loss) to bridge the inconsistency of the distribution for weights and output logits in the elastic quantization space gap. Lastly, we incorporate genetic algorithms and the proposed Conditional Quantization-Aware Accuracy Predictor (CQAP) as an estimator to quickly search mixed-precision quantized neural networks in supernet. Extensive experiments demonstrate that our EQ-Net is close to or even better than its static counterparts as well as state-of-the-art robust bit-width methods. Code can be available at \href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net}.
翻译:当前模型量化方法在减少存储空间和计算复杂度方面展现了良好的潜力。然而,由于不同硬件支持的量化形式存在多样性,现有解决方案的一个局限在于通常需要针对不同场景重复优化。如何构建具有灵活量化形式的模型尚未得到充分研究。本文探索了一种一次性网络量化机制,名为弹性量化神经网络(EQ-Net),旨在训练一个鲁棒的权重共享量化超网络。首先,我们提出弹性量化空间(包括弹性比特宽度、粒度和对称性)以适应各种主流量化形式。其次,提出权重分布正则化损失(WDR-Loss)和分组渐进引导损失(GPG-Loss),以弥合规弹性量化空间差距中权重分布和输出逻辑分布的不一致性。最后,引入遗传算法及所提出的条件量化感知精度预测器(CQAP)作为估算器,在超网络中快速搜索混合精度量化神经网络。大量实验表明,我们的EQ-Net在性能上接近甚至优于其静态对应方法以及最先进的鲁棒比特宽度方法。代码可从 \href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net} 获取。