The paper discusses the use of the Absolute activation function in classification neural networks. An examples are shown of using this activation function in simple and more complex problems. Using as a baseline LeNet-5 network for solving the MNIST problem, the efficiency of Absolute activation function is shown in comparison with the use of Tanh, ReLU and SeLU activations. It is shown that in deep networks Absolute activation does not cause vanishing and exploding gradients, and therefore Absolute activation can be used in both simple and deep neural networks. Due to high volatility of training networks with Absolute activation, a special modification of ADAM training algorithm is used, that estimates lower bound of accuracy at any test dataset using validation dataset analysis at each training epoch, and uses this value to stop/decrease learning rate, and re-initializes ADAM algorithm between these steps. It is shown that solving the MNIST problem with the LeNet-like architectures based on Absolute activation allows to significantly reduce the number of trained parameters in the neural network with improving the prediction accuracy.
翻译:本文探讨了在分类神经网络中使用绝对值激活函数的方法,并展示了该激活函数在简单及复杂问题中的应用实例。以解决MNIST问题的LeNet-5网络为基准,通过与Tanh、ReLU和SeLU激活函数的对比,证明了绝对值激活函数的有效性。研究表明,在深层网络中,绝对值激活不会引发梯度消失或梯度爆炸问题,因此该激活函数可同时应用于简单与深层神经网络。针对使用绝对值激活训练网络时存在的高度波动性,本文采用了一种经过特别改进的ADAM训练算法:该算法通过在每个训练周期中分析验证数据集,评估测试数据集上的准确率下限,并以此为依据决定停止训练或降低学习率,同时在这些步骤之间重新初始化ADAM算法。实验表明,基于绝对值激活的类LeNet架构在解决MNIST问题时,能够在提升预测精度的同时显著减少神经网络的训练参数数量。