Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage over existing methods. Compared to leading compression strategies, our approach records an impressive 20% decrease in model size without compromising accuracy. Additionally, our method boasts a 12x reduction in search time relative to the best search-focused strategies currently available. As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

翻译：随着深度学习模型复杂度和计算需求的不断提升，对神经网络设计进行高效优化的需求变得至关重要。本文提出了一种创新搜索机制，能够自动为单个神经网络层选择最佳比特宽度和层宽度，从而显著提升深度神经网络的效率。通过基于海森矩阵的剪枝策略有策略地缩减搜索域，确保移除非关键参数。随后，我们详细阐述了如何通过基于聚类的树结构Parzen估计器构建有利与不利结果的代理模型。该策略能够简化对架构可能性的探索，并快速锁定性能最优的设计方案。通过在知名数据集上的严格测试，我们的方法证明了其相较于现有方法的显著优势。与领先的压缩策略相比，我们的方法在不牺牲准确率的前提下实现了模型大小缩减20%的惊人效果。此外，相较于当前最优的搜索策略，我们的方法将搜索时间降低了12倍。因此，所提方法代表了神经网络设计优化领域的一次飞跃，为资源受限环境下的快速模型设计与部署铺平了道路，从而推动了可扩展深度学习解决方案的潜力。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日