TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear Unit to Enhance Neural Networks

The application of the deep learning model in classification plays an important role in the accurate detection of the target objects. However, the accuracy is affected by the activation function in the hidden and output layer. In this paper, an activation function called TaLU, which is a combination of Tanh and Rectified Linear Units (ReLU), is used to improve the prediction. ReLU activation function is used by many deep learning researchers for its computational efficiency, ease of implementation, intuitive nature, etc. However, it suffers from a dying gradient problem. For instance, when the input is negative, its output is always zero because its gradient is zero. A number of researchers used different approaches to solve this issue. Some of the most notable are LeakyReLU, Softplus, Softsign, Elu, ThresholdedReLU, etc. This research developed TaLU, a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU. The deep learning model with the proposed activation function was tested on MNIST and CIFAR-10, and it outperforms ReLU and some other studied activation functions in terms of accuracy(from 0\% upto 6\% in most cases, when used with Batch Normalization and a reasonable learning rate).

翻译：深度学习模型在分类中的应用对目标对象的准确检测起着重要作用。然而，隐藏层和输出层中的激活函数会影响准确性。本文采用一种名为TaLU的激活函数（它是Tanh和修正线性单元ReLU的结合）来提高预测性能。ReLU激活函数因其计算高效、易于实现、直观等特点被许多深度学习研究者采用，但它存在死亡梯度问题。例如，当输入为负值时，其输出始终为零，因为梯度为零。许多研究者采用不同方法解决该问题，其中较知名的有LeakyReLU、Softplus、Softsign、Elu、ThresholdedReLU等。本研究开发了TaLU，一种结合Tanh和ReLU的改进激活函数，缓解了ReLU的死亡梯度问题。采用所提激活函数的深度学习模型在MNIST和CIFAR-10数据集上进行了测试，在准确性方面（在使用批量归一化和合理学习率时，大多数情况下可提升0%至6%）优于ReLU及其他部分研究的激活函数。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日