We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).
翻译:本文考虑一类具有随机初始化参数和修正线性单元激活函数的大类浅层神经网络。我们证明这些随机神经网络是定义良好的非高斯过程。作为副产品,我们展示这些网络是脉冲白噪声(随机狄拉克测度的组合)驱动的随机微分方程的解。这些过程由权重和偏差的分布以及输入域每个有界区域内激活阈值密度参数化。我们证明这些过程是各向同性的,且具有赫斯特指数$3/2$的广义自相似性。我们还推导出其自协方差函数的一个极其简洁的闭式表达式。我们的结果与先前工作本质不同在于采用非渐近视角:输入域每个有界区域内的神经元数量(即宽度)本身服从均值和密度参数成正比的泊松分布的随机变量。最后,我们证明在适当假设下,当期望宽度趋于无穷时,这些过程不仅能在分布上收敛到高斯过程,还能根据权重分布收敛到非高斯过程。我们的渐近结果为一些经典结论(宽网络收敛到高斯过程)提供了新解读,并推导出若干新结论(宽网络可收敛到非高斯过程)。