We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.
翻译:我们证明:在具有跳跃连接的深度神经网络中,当分支按$1/\sqrt{深度}$(唯一非平凡缩放方式)缩放时,无论以何种顺序取宽度和深度趋于无穷的极限,都将得到相同的协方差结构。这解释了为什么标准的"先取无穷宽度再取无穷深度"方法,即使对于深度与宽度同阶的网络,也能提供实用的见解。我们还证明,在这种情况下,预激活值服从高斯分布,这一性质在贝叶斯深度学习中有直接应用。我们进行了大量仿真实验,结果与理论发现高度吻合。