The application of the deep learning model in classification plays an important role in the accurate detection of the target objects. However, the accuracy is affected by the activation function in the hidden and output layer. In this paper, an activation function called TaLU, which is a combination of Tanh and Rectified Linear Units (ReLU), is used to improve the prediction. ReLU activation function is used by many deep learning researchers for its computational efficiency, ease of implementation, intuitive nature, etc. However, it suffers from a dying gradient problem. For instance, when the input is negative, its output is always zero because its gradient is zero. A number of researchers used different approaches to solve this issue. Some of the most notable are LeakyReLU, Softplus, Softsign, Elu, ThresholdedReLU, etc. This research developed TaLU, a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU. The deep learning model with the proposed activation function was tested on MNIST and CIFAR-10, and it outperforms ReLU and some other studied activation functions in terms of accuracy(from 0\% upto 6\% in most cases, when used with Batch Normalization and a reasonable learning rate).
翻译:深度学习模型在分类中的应用对目标对象的准确检测起着重要作用。然而,隐藏层和输出层中的激活函数会影响准确性。本文采用一种名为TaLU的激活函数(它是Tanh和修正线性单元ReLU的结合)来提高预测性能。ReLU激活函数因其计算高效、易于实现、直观等特点被许多深度学习研究者采用,但它存在死亡梯度问题。例如,当输入为负值时,其输出始终为零,因为梯度为零。许多研究者采用不同方法解决该问题,其中较知名的有LeakyReLU、Softplus、Softsign、Elu、ThresholdedReLU等。本研究开发了TaLU,一种结合Tanh和ReLU的改进激活函数,缓解了ReLU的死亡梯度问题。采用所提激活函数的深度学习模型在MNIST和CIFAR-10数据集上进行了测试,在准确性方面(在使用批量归一化和合理学习率时,大多数情况下可提升0%至6%)优于ReLU及其他部分研究的激活函数。