Learning Activation Functions for Sparse Neural Networks

Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts while saving significant energy and memory at inference. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. While recent works mitigate this issue through sophisticated pruning techniques, we shift our focus to an overlooked factor: hyperparameters and activation functions. Our analyses have shown that the accuracy drop can additionally be attributed to (i) Using ReLU as the default choice for activation functions unanimously, and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts. Thus, we focus on learning a novel way to tune activation functions for sparse networks and combining these with a separate hyperparameter optimization (HPO) regime for sparse networks. By conducting experiments on popular DNN models (LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10, and ImageNet-16 datasets, we show that the novel combination of these two approaches, dubbed Sparse Activation Function Search, short: SAFS, results in up to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy for LeNet-5, VGG-16, and ResNet-18 over the default training protocols, especially at high pruning ratios. Our code can be found at https://github.com/automl/SAFS

翻译：稀疏神经网络（SNN）在推理时能够节省大量能源和内存，同时保持与稠密网络相近的性能。然而，在高剪枝率等关键部署条件下，SNN导致的精度下降可能成为问题。近期研究通过复杂剪枝技术缓解此问题，但我们转向一个被忽视的因素：超参数与激活函数。我们的分析表明，精度下降还可归因于：（i）将ReLU作为默认激活函数的单一选择，以及（ii）使用与稠密网络相同的超参数对SNN进行微调。因此，我们专注于学习一种针对稀疏网络的新型激活函数调优方法，并将其与稀疏网络的独立超参数优化（HPO）机制相结合。通过在MNIST、CIFAR-10和ImageNet-16数据集上训练的典型DNN模型（LeNet-5、VGG-16、ResNet-18和EfficientNet-B0）上进行实验，我们证明这两种方法的创新组合（称为稀疏激活函数搜索，简称SAFS）在LeNet-5、VGG-16和ResNet-18上相较于默认训练协议，精度绝对提升分别高达15.53%、8.88%和6.33%，尤其是在高剪枝率下。我们的代码可在https://github.com/automl/SAFS获取。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。