This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The "swap" of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
翻译:本研究应对神经网络剪枝中计算经验Fisher信息矩阵(FIM)时梯度不准确的问题。我们提出SWAP——一种基于熵Wasserstein回归(EWR)的网络剪枝框架,系统利用最优传输(OT)问题的几何特性。分析表明,在优化过程中用EWR替换常用的标准线性回归(LR)的“交换”操作,通过数据点间的邻域插值能在几乎不增加额外计算成本的前提下,显著提升噪声抑制能力。SWAP的独特优势在于其内在平衡噪声抑制与协方差信息保留的能力。在多种网络上进行的大量实验表明,SWAP与最先进(SoTA)的网络剪枝算法性能相当。当网络规模或目标稀疏度较大时,本方法优于SoTA算法,且在存在噪声梯度(可能源于噪声数据、模拟存储器或对抗攻击)的情况下优势更为显著。值得注意的是,当MobileNetV1剩余网络参数不足四分之一时,本方法在准确率上获得6%的提升,测试损失上获得8%的提升。