We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Fa\`a di Bruno's formula to an arbitrary number of compositions. In this expansion, the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices, which we call operator graphs. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick's principle to compute the expectation of each of its terms. We then determine the leading contribution to each term by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. Furthermore, by developing a correspondence between analytic and graphical operations, we obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease. Notably, we find explicit formulae for the moments of the limiting singular value distribution of the Jacobian. We then show that all of these results hold for networks with more general weights, such as general matrices with i.i.d. entries satisfying moment assumptions, complex matrices and sparse matrices.
翻译:我们提出了一种利用随机矩阵理论中的亏格展开技术来获得神经网络标度极限的统一方法。该方法始于一种新颖的神经网络展开,其令人联想到常微分方程的Butcher级数,并且是通过将Faà di Bruno公式推广到任意数量复合而得到的。在此展开中,单项式的作用由以有向图为索引的随机多线性映射所扮演,这些有向图的边对应于随机矩阵,我们称之为算子图。该展开线性化了激活函数的作用,允许直接应用Wick原理来计算其每一项的期望值。然后,我们通过将相应的图嵌入到曲面并计算其欧拉特征,来确定每一项的主导贡献。此外,通过建立解析操作与图形操作之间的对应关系,我们为神经正切核以及原始神经网络的输入-输出雅可比矩阵获得了类似的图展开,并相对容易地推导了它们的无限宽度极限。值得注意的是,我们得到了雅可比矩阵极限奇异值分布矩的显式公式。随后我们证明,所有这些结果对于具有更一般权重(例如满足矩假设的独立同分布条目的一般矩阵、复矩阵和稀疏矩阵)的网络同样成立。