We develop a general theory of synaptic neural balance and how it can emerge or be enforced in neural networks. For a given regularizer, a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. Gradient descent on the error function alone does not converge in general to a balanced state, where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function ought to converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. Simulations show that balancing neurons prior to learning, or during learning in alternation with gradient descent steps, can improve learning speed and final performance.
翻译:我们发展了一个关于突触神经平衡及其如何在神经网络中自然涌现或被强制实现的通用理论。对于给定的正则化器,如果一个神经元输入权重的总成本等于其输出权重的总成本,则称该神经元处于平衡状态。基本示例由使用$L_2$正则化器训练的前馈ReLU单元网络提供,这些网络在适当训练后表现出平衡性。该理论解释了这一现象并将其扩展到多个方向。第一个方向是扩展到双线性及其他激活函数。第二个方向是扩展到更一般的正则化器,包括所有$L_p$正则化器。第三个方向是扩展到非分层架构、循环架构、卷积架构以及混合激活函数架构。仅对误差函数进行梯度下降通常不会收敛到每个神经元都处于平衡的平衡状态,即使从平衡状态开始训练也是如此。然而,对正则化误差函数进行梯度下降应当收敛到平衡状态,因此网络平衡可用于评估学习进展。该理论基于两种局部神经元操作:可交换的缩放操作和不可交换的平衡操作。给定任意初始权重集,当以随机方式对每个神经元应用局部平衡操作时,通过随机平衡算法收敛到同一组唯一的平衡权重,全局秩序总是会涌现。其根本原因在于存在一个底层严格凸优化问题,其中相关变量被约束在一个仅依赖于架构的线性流形上。仿真实验表明,在学习前平衡神经元,或在学习过程中与梯度下降步骤交替进行平衡操作,可以提高学习速度和最终性能。