Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.
翻译:深度神经网络(DNN)对于在边缘或移动设备上执行高级任务至关重要,但其部署常受限于严峻的资源约束,包括有限的内存、能耗和计算能力。虽然均匀量化提供了一种压缩模型并降低硬件需求的直接方法,但它未能充分利用各层间不同的鲁棒性,且常导致精度下降或资源利用次优,尤其在低比特位宽下。相比之下,异构量化通过为各层分配不同的比特位宽,能够缓解这些缺陷。然而,现有的异构量化方法要么需要巨大的暴力设计空间搜索,要么缺乏适应不同硬件条件(如内存大小、能耗预算和延迟要求)的能力。为填补这些空白,本文提出了 \textbf{\textit{SigmaQuant}},一种自适应的逐层异构量化框架,旨在无需穷举搜索即可高效平衡不同边缘环境下的精度与资源使用。