Quantization without Tears

Deep neural networks, while achieving remarkable success across diverse tasks, demand significant resources, including computation, GPU memory, bandwidth, storage, and energy. Network quantization, as a standard compression and acceleration technique, reduces storage costs and enables potential inference acceleration by discretizing network weights and activations into a finite set of integer values. However, current quantization methods are often complex and sensitive, requiring extensive task-specific hyperparameters, where even a single misconfiguration can impair model performance, limiting generality across different models and tasks. In this paper, we propose Quantization without Tears (QwT), a method that simultaneously achieves quantization speed, accuracy, simplicity, and generality. The key insight of QwT is to incorporate a lightweight additional structure into the quantized network to mitigate information loss during quantization. This structure consists solely of a small set of linear layers, keeping the method simple and efficient. More importantly, it provides a closed-form solution, allowing us to improve accuracy effortlessly under 2 minutes. Extensive experiments across various vision, language, and multimodal tasks demonstrate that QwT is both highly effective and versatile. In fact, our approach offers a robust solution for network quantization that combines simplicity, accuracy, and adaptability, which provides new insights for the design of novel quantization paradigms.

翻译：深度神经网络虽然在各类任务中取得了显著成功，但其对计算资源、GPU内存、带宽、存储和能耗的需求巨大。网络量化作为一种标准的压缩与加速技术，通过将网络权重和激活值离散化为有限的整数值集合，能够降低存储成本并实现潜在的推理加速。然而，现有的量化方法通常复杂且敏感，需要大量针对特定任务的超参数，其中即使单个参数配置不当也可能损害模型性能，从而限制了方法在不同模型与任务间的通用性。本文提出无痛量化（Quantization without Tears, QwT），该方法能够同时实现量化的速度、精度、简洁性与通用性。QwT的核心思想是在量化网络中引入一个轻量级的附加结构，以缓解量化过程中的信息损失。该结构仅由少量线性层构成，保持了方法的简洁性与高效性。更重要的是，它提供了闭式解，使我们能够在2分钟内轻松提升量化精度。在视觉、语言和多模态任务上的大量实验表明，QwT方法既高效又具备广泛的适用性。事实上，我们的方法为网络量化提供了一个兼具简洁性、准确性与适应性的鲁棒解决方案，这为新型量化范式的设计提供了新的思路。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日