SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning

This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The "swap" of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.

翻译：本研究应对神经网络剪枝中计算经验Fisher信息矩阵（FIM）时梯度不准确的问题。我们提出SWAP——一种基于熵Wasserstein回归（EWR）的网络剪枝框架，系统利用最优传输（OT）问题的几何特性。分析表明，在优化过程中用EWR替换常用的标准线性回归（LR）的“交换”操作，通过数据点间的邻域插值能在几乎不增加额外计算成本的前提下，显著提升噪声抑制能力。SWAP的独特优势在于其内在平衡噪声抑制与协方差信息保留的能力。在多种网络上进行的大量实验表明，SWAP与最先进（SoTA）的网络剪枝算法性能相当。当网络规模或目标稀疏度较大时，本方法优于SoTA算法，且在存在噪声梯度（可能源于噪声数据、模拟存储器或对抗攻击）的情况下优势更为显著。值得注意的是，当MobileNetV1剩余网络参数不足四分之一时，本方法在准确率上获得6%的提升，测试损失上获得8%的提升。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日