We develop a theory of neural synaptic balance and how it can emerge or be enforced in neural networks. For a given additive cost function $R$ (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ ($p>0$) regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. The theory is corroborated through various simulations carried out on benchmark data sets. Scaling and balancing operations are entirely local and thus physically plausible in biological and neuromorphic networks.
翻译:我们发展了一套关于神经突触平衡及其如何在神经网络中涌现或被强化的理论。对于给定的加性成本函数$R$(正则化器),若一个神经元输入权重的总成本等于其输出权重的总成本,则称该神经元处于平衡状态。基本示例由采用$L_2$正则化器训练的ReLU单元前馈网络提供,这类网络在适当训练后会呈现平衡现象。该理论解释了这一现象并将其扩展到多个方向:首先是向双线性及其他激活函数的扩展;其次是向更广义正则化器的扩展,包括所有$L_p$($p>0$)正则化器;第三是向非分层架构、循环架构、卷积架构以及混合激活函数架构的扩展。该理论基于两种局部神经元操作:可交换的缩放操作与不可交换的平衡操作。最终且最重要的是,给定任意初始权重集合,当以随机方式对每个神经元应用局部平衡操作时,通过随机平衡算法收敛至同一组唯一的平衡权重,全局秩序总会随之涌现。这种收敛性源于存在一个底层严格凸优化问题,其中相关变量被约束于一个仅依赖于架构的线性流形。该理论通过在基准数据集上进行的多项仿真实验得到验证。缩放与平衡操作完全具有局部性,因此在生物神经网络与神经形态网络中具备物理可实现性。