DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization

Efficiently deploying deep neural networks on low-resource edge devices is challenging due to their ever-increasing resource requirements. To address this issue, researchers have proposed multiplication-free neural networks, such as Power-of-Two quantization, or also known as Shift networks, which aim to reduce memory usage and simplify computation. However, existing low-bit Shift networks are not as accurate as their full-precision counterparts, typically suffering from limited weight range encoding schemes and quantization loss. In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. In addition, we introduce a method to deploy an efficient DenseShift network using non-quantized floating-point activations, while obtaining 1.6X speed-up over existing methods. To achieve this, we demonstrate that zero-weight values in low-bit Shift networks do not contribute to model capacity and negatively impact inference computation. To address this issue, we propose a zero-free shifting mechanism that simplifies inference and increases model capacity. We further propose a sign-scale decomposition design to enhance training efficiency and a low-variance random initialization strategy to improve the model's transfer learning performance. Our extensive experiments on various computer vision and speech tasks demonstrate that DenseShift outperforms existing low-bit multiplication-free networks and achieves competitive performance compared to full-precision networks. Furthermore, our proposed approach exhibits strong transfer learning performance without a drop in accuracy. Our code was released on GitHub.

翻译：在低资源边缘设备上高效部署深度神经网络因其不断增长的资源需求而面临挑战。为解决此问题，研究者提出了免乘法神经网络，例如二次幂量化（亦称移位网络），旨在减少内存使用并简化计算。然而，现有低比特移位网络的精度不及全精度对应模型，通常受限于权重范围编码方案有限及量化损失。本文提出DenseShift网络，显著提升了移位网络的精度，在视觉与语音应用中达到与全精度网络相媲美的性能。此外，我们引入一种方法，通过使用非量化浮点激活值部署高效DenseShift网络，同时相较现有方法获得1.6倍加速。为此，我们证明低比特移位网络中的零权重值不贡献模型容量，并对推理计算产生负面影响。针对此问题，我们提出零自由移位机制以简化推理并提高模型容量。进一步提出符号-尺度分解设计以提升训练效率，以及低方差随机初始化策略以改善模型的迁移学习性能。我们在多种计算机视觉与语音任务上的广泛实验表明，DenseShift优于现有低比特免乘法网络，并与全精度网络性能相当。此外，我们所提方法展现出强大的迁移学习能力且准确率未下降。代码已在GitHub上开源。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日