Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.
翻译:激活函数通过引入非线性在神经网络设计中扮演着重要角色。先前研究表明,激活函数的选择会影响损失景观的几何特性。理解激活函数与损失景观属性之间的关系对于神经架构和训练算法设计具有重要意义。本研究对双曲正切函数、修正线性单元和指数线性单元激活函数对应的神经网络损失景观进行了实证分析。结果表明,修正线性单元能够产生最凸的损失景观,而指数线性单元产生的损失景观平坦程度最低,且表现出更优的泛化性能。研究发现所有激活函数对应的损失景观均存在宽谷与窄谷结构,其中窄谷与神经元饱和状态及隐式正则化的网络配置具有相关性。