In this paper, we introduce the Hyperbolic Tangent Exponential Linear Unit (TeLU), a novel neural network activation function, represented as $f(x) = x{\cdot}tanh(e^x)$. TeLU is designed to overcome the limitations of conventional activation functions like ReLU, GELU, and Mish by addressing the vanishing and, to an extent, the exploding gradient problems. Our theoretical analysis and empirical assessments reveal that TeLU outperforms existing activation functions in stability and robustness, effectively adjusting activation outputs' mean towards zero for enhanced training stability and convergence. Extensive evaluations against popular activation functions (ReLU, GELU, SiLU, Mish, Logish, Smish) across advanced architectures, including Resnet-50, demonstrate TeLU's lower variance and superior performance, even under hyperparameter conditions optimized for other functions. In large-scale tests with challenging datasets like CIFAR-10, CIFAR-100, and TinyImageNet, encompassing 860 scenarios, TeLU consistently showcased its effectiveness, positioning itself as a potential new standard for neural network activation functions, boosting stability and performance in diverse deep learning applications.
翻译:本文提出了一种新型神经网络激活函数——双曲正切指数线性单元(TeLU),其表达式为 $f(x) = x{\cdot}tanh(e^x)$。TeLU旨在通过解决梯度消失及部分梯度爆炸问题,克服ReLU、GELU和Mish等传统激活函数的局限性。我们的理论分析与实证评估表明,TeLU在稳定性和鲁棒性方面优于现有激活函数,能有效将激活输出的均值调整至接近零,从而提升训练稳定性与收敛速度。在包括ResNet-50在内的先进架构上,与主流激活函数(ReLU、GELU、SiLU、Mish、Logish、Smish)进行的广泛评估显示,即使在使用为其他函数优化的超参数条件下,TeLU仍展现出更低的方差与更优的性能。在包含CIFAR-10、CIFAR-100和TinyImageNet等挑战性数据集的大规模测试中(涵盖860个场景),TeLU持续验证了其有效性,有望成为神经网络激活函数的新基准,在各类深度学习应用中增强稳定性与性能。