In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces, but in a learning equilibrium, the problem can be linearized and reduced to many weakly coupled one-dimensional problems. We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.
翻译:在人工神经网络中,不可训练变量的激活动力学与可训练变量的学习动力学强耦合。在激活过程中,边界神经元(如输入神经元)被映射到体神经元(如隐藏神经元);而在学习过程中,体神经元和边界神经元共同被映射到可训练变量(如权重和偏置)的变化。例如,在前馈神经网络中,前向传播是激活过程,反向传播是学习过程。我们证明,这两个映射的复合在不可训练边界变量的子空间(如数据集)与可训练变量的切子空间(即学习)之间建立了一个对偶映射。一般而言,数据集-学习对偶是高维空间之间的复杂非线性映射,但在学习平衡态下,该问题可被线性化并简化为多个弱耦合的一维问题。我们利用该对偶性研究临界性的涌现,即可训练变量涨落的幂律分布。特别地,我们证明即使数据集处于非临界状态,学习系统中仍可涌现临界性,且幂律分布可通过改变激活函数或损失函数进行调控。