Selecting the most suitable activation function is a critical factor in the effectiveness of deep learning models, as it influences their learning capacity, stability, and computational efficiency. In recent years, the Gaussian Error Linear Unit (GELU) activation function has emerged as a dominant method, surpassing traditional functions such as the Rectified Linear Unit (ReLU) in various applications. This study presents a rigorous mathematical investigation of the GELU activation function, exploring its differentiability, boundedness, stationarity, and smoothness properties in detail. Additionally, we conduct an extensive experimental comparison of the GELU function against a broad range of alternative activation functions, utilizing a residual convolutional network trained on the CIFAR-10, CIFAR-100, and STL-10 datasets as the empirical testbed. Our results demonstrate the superior performance of GELU compared to other activation functions, establishing its suitability for a wide range of deep learning applications. This comprehensive study contributes to a more profound understanding of the underlying mathematical properties of GELU and provides valuable insights for practitioners aiming to select activation functions that optimally align with their specific objectives and constraints in deep learning.
翻译:选择最合适的激活函数是深度学习模型有效性的关键因素,因为它直接影响模型的学习能力、稳定性和计算效率。近年来,高斯误差线性单元(GELU)激活函数已成为一种主流方法,在各种应用中超越了传统的修正线性单元(ReLU)等函数。本研究对GELU激活函数进行了严格的数学探究,详细分析了其可微性、有界性、平稳性和光滑性等性质。此外,我们以在CIFAR-10、CIFAR-100和STL-10数据集上训练的残差卷积网络作为实证测试平台,对GELU函数与一系列广泛的替代激活函数进行了广泛的实验比较。我们的结果表明,GELU相较于其他激活函数具有优越性能,证实其适用于广泛的深度学习应用。这项综合性研究有助于更深入地理解GELU的底层数学性质,并为实践者提供宝贵见解,帮助其选择与深度学习任务中特定目标和约束条件最优匹配的激活函数。