Differentiable Search for Finding Optimal Quantization Strategy

To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.

翻译：为加速和压缩深度神经网络（DNN），现有研究已提出多种网络量化算法。尽管当前最优算法中的任一量化策略可能在某些网络架构上表现更优，但难以证明该策略始终优于其他策略，甚至无法判断其是否为网络中所有层级的最优选择。换言之，现有量化算法均非最优，因其忽略了不同层级的差异性，对所有层级采用统一量化策略。针对这一问题，本文提出可微量化策略搜索（DQSS），通过融合不同量化算法的优势，为各层级分配最优量化策略。具体而言，我们将DQSS建模为可微神经架构搜索问题，采用高效卷积操作，通过基于梯度的优化方法从全局视角高效探索混合量化策略。在训练后量化场景中应用DQSS，使其性能可媲美全精度模型；同时将DQSS引入量化感知训练，以进一步验证其有效性。为规避在量化感知训练中使用DQSS带来的高昂优化成本，我们通过单次前向-反向传播更新超参数与网络参数，并调整优化过程以避免潜在的欠拟合问题。针对高级计算机视觉任务（如图像分类）与低级计算机视觉任务（如图像超分辨率）的多种网络架构实验表明，DQSS性能显著优于当前最优方法。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日