Learning Activation Functions for Sparse Neural Networks

Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts while saving significant energy and memory at inference. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. While recent works mitigate this issue through sophisticated pruning techniques, we shift our focus to an overlooked factor: hyperparameters and activation functions. Our analyses have shown that the accuracy drop can additionally be attributed to (i) Using ReLU as the default choice for activation functions unanimously, and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts. Thus, we focus on learning a novel way to tune activation functions for sparse networks and combining these with a separate hyperparameter optimization (HPO) regime for sparse networks. By conducting experiments on popular DNN models (LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10, and ImageNet-16 datasets, we show that the novel combination of these two approaches, dubbed Sparse Activation Function Search, short: SAFS, results in up to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy for LeNet-5, VGG-16, and ResNet-18 over the default training protocols, especially at high pruning ratios. Our code can be found at https://github.com/automl/SAFS

翻译：稀疏神经网络（SNNs）在推理时能够节省大量能源和内存，同时可能展现出与稠密网络相似的性能。然而，SNN带来的精度下降问题（尤其是在高剪枝率下）可能成为关键部署场景中的隐患。近期研究通过复杂剪枝技术缓解此问题，但我们转而关注一个被忽视的因素：超参数与激活函数。我们的分析表明，精度下降还可归因于：（i）普遍将ReLU作为默认激活函数；（ii）使用与稠密网络相同的超参数微调SNN。因此，我们专注于学习一种为稀疏网络调控激活函数的新方法，并将其与针对稀疏网络的独立超参数优化（HPO）机制相结合。通过在MNIST、CIFAR-10和ImageNet-16数据集上训练的经典深度神经网络模型（LeNet-5、VGG-16、ResNet-18和EfficientNet-B0）上进行实验，我们证明这两种方法的创新组合（称为稀疏激活函数搜索，简称SAFS）在LeNet-5、VGG-16和ResNet-18上相比默认训练协议实现了高达15.53%、8.88%和6.33%的绝对精度提升，尤其是在高剪枝率下。我们的代码可在https://github.com/automl/SAFS获取。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。