Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.

翻译：混合精度量化通过将深度神经网络的不同层量化至不同精度，为优化模型大小、延迟与统计精度之间的权衡提供了可能性，其效果超越同比特宽度量化方法。为在给定网络中导航混合精度配置的复杂搜索空间，本文提出一种混合搜索方法，该方法包括硬件无关的微分搜索算法以及随后的硬件感知启发式优化，旨在寻找针对特定硬件目标进行延迟优化的混合精度配置。我们在MobileNetV1和MobileNetV2上评估了所提算法，并将生成的网络部署至具有不同硬件特性的多核RISC-V微控制器平台系列。在千类ImageNet数据集上，与全精度基线相比，我们在精度损失可忽略的情况下，实现了相比8位模型最高28.6%的端到端延迟降低。即便在缺乏子字节算术硬件支持的系统中，我们仍展示了相对8位基线模型的加速效果，且精度损失可忽略。此外，我们证明了本方法相对于以简化二进制运算次数作为延迟代理指标的微分搜索方法具有优越性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日