Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.
翻译:针对任意深度全连接神经网络,若其随机初始化参数服从高斯分布,我们从上方界定了该网络输出分布与适当高斯过程之间的二次Wasserstein距离。我们的显式不等式揭示了隐藏层和输出层尺寸如何影响网络的高斯行为,并定量地恢复了宽极限下的分布收敛结果,即当所有隐藏层尺寸趋于无穷大时的收敛特征。