Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide a mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.
翻译:激活函数是神经网络引入非线性特性的关键,其中线性整流单元(ReLU)因其简洁高效而备受青睐。受浅层前馈神经网络(FNN)与投影梯度下降(PGD)算法单次迭代结构相似性的启发(PGD是求解约束优化问题的标准方法),本文将ReLU视为从实数集R到非负半轴R+的投影算子。基于这一解释,我们通过将ReLU替换为凸锥上的广义投影算子进行扩展,例如二阶锥(SOC)投影,从而自然地将其推广为多输入多输出的多变量投影单元(MPU)。我们还提供了数学证明,表明采用SOC投影的FNN在表达能力上优于使用ReLU的FNN。在广泛采用的架构上进行的实验评估进一步证实了MPU相对于现有多种激活函数的有效性。