QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

翻译：量化神经网络是在移动和嵌入式设备上实现高效推理的最有效方法之一。特别是混合精度量化（MPQ）网络，其各层可量化至不同位宽，在相同资源约束下相比于同构位宽网络能取得更优的任务性能。然而，寻找最优位宽分配极具挑战性，因为搜索空间随网络层数呈指数级增长。本文提出QBitOpt算法，一种在量化感知训练（QAT）过程中更新位宽的新方法。我们将位宽分配问题形式化为约束优化问题。通过在QAT过程中结合快速计算敏感度与高效求解器，QBitOpt能够生成在严格满足资源约束条件下具有高任务性能的混合精度网络。这与现有使用梯度学习位宽且无法提供此类保证的混合精度方法形成鲜明对比。我们在ImageNet上评估QBitOpt，并确认在文献中常见的平均位宽约束下，其性能优于现有固定精度和混合精度方法。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日