Kernel Limit for a Class of Recurrent Neural Networks Trained on Ergodic Data Sequences

from arxiv, Revision in response to reviewers' comments. The mean-field random function has been replaced by a mean-field term. Some typos fixed. Minor title change

Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(1/N)$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.

翻译：本文发展了数学方法来刻画循环神经网络（RNN）在隐藏单元数、序列数据样本数、隐藏状态更新次数以及训练步数同时趋于无穷时的渐近行为。针对具有简化权重矩阵的RNN，我们证明了RNN收敛于一个无限维常微分方程的解与一个随机代数方程不动点的耦合。该分析需要解决几个RNN特有的挑战。在典型的平均场应用中（如前馈神经网络），离散更新的量级为$\mathcal{O}(1/N)$，更新次数为$\mathcal{O}(N)$。因此，系统可以表示为某个适当ODE/PDE的欧拉近似，并随着$N \rightarrow \infty$收敛于该方程。然而，RNN隐藏层的更新是$\mathcal{O}(1)$量级的。因此，RNN不能表示为ODE/PDE的离散化，标准平均场技术无法直接应用。为此，我们针对RNN记忆状态的演化发展了一种不动点分析，其收敛估计以更新步数和隐藏单元数为参数。RNN隐藏层被研究为索伯列夫空间中的一个函数，其演化由数据序列（一个马尔可夫链）、参数更新以及其对前一时刻RNN隐藏层的依赖性所支配。由于更新之间存在强相关性，必须使用泊松方程来约束RNN围绕其极限方程的涨落。这些数学方法导出了在数据样本数和神经网络规模趋于无穷时，在数据序列上训练的RNN的神经正切核（NTK）极限。