Compressing neural network by tensor network with exponentially fewer variational parameters

Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to deep automatically-differentiable tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, where the testing accuracy on CIFAR-10 is improved from $90.17 \%$ to $91.74\%$. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.

翻译：为应对具有挑战性的机器学习任务而设计的神经网络（NN）本质上是一种包含海量变分参数的高度非线性映射。若缺乏约束或限制，神经网络的高复杂度可能导致不可预测的严重问题，包括过拟合、泛化能力丧失及高昂的硬件成本。本研究提出一种通用压缩方案，通过将神经网络变分参数编码至深度可自动微分张量网络（ADTN）中，实现指数级减少自由参数。该方案在多个广泛认可的神经网络模型（如FC-2、LeNet-5、AlexNet、ZFNet和VGG-16）及数据集（MNIST、CIFAR-10和CIFAR-100）上展现出卓越的压缩性能。例如，我们将VGG-16中约含$10^{7}$个参数的两个线性层压缩为仅含424个参数的两个ADTN，同时在CIFAR-10上的测试准确率从$90.17\%$提升至$91.74\%$。本研究表明，张量网络（TN）作为表征神经网络变分参数的极其高效的数学结构，相较常规矩阵与多维数组具有更优越的可压缩性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

【ICML2021】轻量级结构多样化的网络结构

专知会员服务

29+阅读 · 2021年8月2日