In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces. We use duality to study the emergence of criticality, or the power-law distribution of fluctuations of the trainable variables, using a toy model at learning equilibrium. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.
翻译:在人工神经网络中,不可训练变量的激活动力学与可训练变量的学习动力学强耦合。在激活传递过程中,边界神经元(如输入神经元)被映射至体神经元(如隐藏神经元);而在学习传递过程中,体神经元与边界神经元共同映射至可训练变量(如权重与偏置)的变化。例如,在前馈神经网络中,前向传播对应激活传递,反向传播对应学习传递。我们证明,这两个映射的复合在不可训练边界变量子空间(如数据集)与可训练变量的切子空间(即学习过程)之间建立了对偶映射。一般而言,数据集-学习对偶是高维空间之间复杂的非线性映射。我们利用该对偶性,通过一个处于学习平衡的玩具模型研究临界性(即可训练变量涨落的幂律分布)的涌现机制。特别地,我们证明即使数据集处于非临界状态,学习系统中仍可涌现临界性,且通过改变激活函数或损失函数可调控该幂律分布。