This study addresses the challenge of inaccurate gradients in computing the empirical Fisher Information Matrix during neural network pruning. We introduce SWAP, a formulation of Entropic Wasserstein regression (EWR) for pruning, capitalizing on the geometric properties of the optimal transport problem. The ``swap'' of the commonly used linear regression with the EWR in optimization is analytically demonstrated to offer noise mitigation effects by incorporating neighborhood interpolation across data points with only marginal additional computational cost. The unique strength of SWAP is its intrinsic ability to balance noise reduction and covariance information preservation effectively. Extensive experiments performed on various networks and datasets show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
翻译:本研究针对神经网络剪枝中计算经验Fisher信息矩阵时存在梯度不准确的问题。我们提出SWAP(一种熵Wasserstein回归的剪枝框架),利用最优传输问题的几何性质进行模型压缩。该方法的创新性在于用熵Wasserstein回归替代优化中常用的线性回归,通过跨数据点的邻域插值仅需微量额外计算即可产生噪声抑制效果。SWAP的核心优势在于能够有效平衡噪声抑制与协方差信息保持之间的权衡。在多种网络架构和数据集上的大量实验表明,SWAP的性能与当前最优网络剪枝算法相当。当网络规模或目标稀疏度较大时,本方法显著优于现有最优方法;在噪声梯度(可能源于噪声数据、模拟存储器或对抗攻击)存在的情况下,性能优势更为明显。值得注意的是,当MobileNetV1保留的网络参数不足四分之一时,本方法仍能实现准确率提升6%、测试损失降低8%的显著效果。