Activation functions are essential to deep learning networks. Popular and versatile activation functions are mostly monotonic functions, some non-monotonic activation functions are being explored and show promising performance. But by introducing non-monotonicity, they also alter the positive input, which is proved to be unnecessary by the success of ReLU and its variants. In this paper, we double down on the non-monotonic activation functions' development and propose the Saturated Gaussian Error Linear Units by combining the characteristics of ReLU and non-monotonic activation functions. We present three new activation functions built with our proposed method: SGELU, SSiLU, and SMish, which are composed of the negative portion of GELU, SiLU, and Mish, respectively, and ReLU's positive portion. The results of image classification experiments on CIFAR-100 indicate that our proposed activation functions are highly effective and outperform state-of-the-art baselines across multiple deep learning architectures.
翻译:激活函数对深度学习网络至关重要。目前流行且通用的激活函数多为单调函数,部分非单调激活函数正被探索并展现出优越性能。然而,引入非单调性会改变正输入区域,这一点被ReLU及其变体的成功证明并非必要。本文深入推进非单调激活函数研究,通过结合ReLU与非单调激活函数的特性,提出饱和高斯误差线性单元。我们基于该方法构建了三种新激活函数:SGELU、SSiLU和SMish,其分别由GELU、SiLU和Mish的负值部分与ReLU的正值部分组成。在CIFAR-100数据集上的图像分类实验表明,所提激活函数高效且在多类深度学习架构中均优于当前最优基线模型。