"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

from arxiv, 32 pages, 4 figures, and 2 tables. Fixing typos in Theorems 1 and 2 from NeurIPS 2022 proceeding (https://proceedings.neurips.cc/paper_files/paper/2022/hash/185087ea328b4f03ea8fd0c8aa96f747-Abstract-Conference.html)

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}.

翻译：现代深度神经网络（DNN）功能极其强大；然而，这以增加深度和每层参数数量为代价，使其训练和推理在计算上更具挑战性。为应对这一关键限制，研究者致力于对这些大规模机器学习模型进行压缩（例如稀疏化和/或量化），以便将其部署在低功耗物联网设备上。本文基于神经正切核（NTK）和随机矩阵理论（RMT）的最新进展，提出了一种针对宽全连接深度神经网络的新型压缩方法。具体而言，我们证明在数据点数量n及其维度p均较大的高维场景下，且数据服从高斯混合模型时，对于一大类深度神经网络模型，其NTK矩阵之间存在渐近谱等价性。这一理论结果使得对给定DNN进行“无损”压缩成为可能——即压缩后的网络与原始（稠密且未量化）网络具有渐近相同的NTK，且其权重和激活值在缩放后仅取自{0, ±1}。我们通过合成数据和真实世界数据的实验验证了所提出压缩方案的优势，代码已开源在https://github.com/Model-Compression/Lossless_Compression。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日