Sparse neural networks are a key factor in developing resource-efficient machine learning applications. We propose the novel and powerful sparse learning method Adaptive Regularized Training (ART) to compress dense into sparse networks. Instead of the commonly used binary mask during training to reduce the number of model weights, we inherently shrink weights close to zero in an iterative manner with increasing weight regularization. Our method compresses the pre-trained model knowledge into the weights of highest magnitude. Therefore, we introduce a novel regularization loss named HyperSparse that exploits the highest weights while conserving the ability of weight exploration. Extensive experiments on CIFAR and TinyImageNet show that our method leads to notable performance gains compared to other sparsification methods, especially in extremely high sparsity regimes up to 99.8 percent model sparsity. Additional investigations provide new insights into the patterns that are encoded in weights with high magnitudes.
翻译:稀疏神经网络是开发资源高效机器学习应用的关键因素。我们提出了一种新颖且强大的稀疏学习方法——自适应正则化训练(ART),用于将稠密网络压缩为稀疏网络。与训练中常用的二进制掩码(用于减少模型权重量)不同,我们通过逐步增强权重正则化,以迭代方式将接近于零的权重自然收缩。我们的方法将预训练模型的知识压缩到最大幅度的权重中。因此,我们引入了一种名为HyperSparse的新型正则化损失,它在保留权重探索能力的同时利用最高幅度的权重。在CIFAR和TinyImageNet上的大量实验表明,与其他稀疏化方法相比,我们的方法在性能上取得了显著提升,特别是在高达99.8%模型稀疏度的极端高稀疏场景下。进一步的探索为高幅度权重中编码的模式提供了新的见解。