Despite remarkable performance on a variety of tasks, many properties of deep neural networks are not yet theoretically understood. One such mystery is the depth degeneracy phenomenon: the deeper you make your network, the closer your network is to a constant function on initialization. In this paper, we examine the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers. By using combinatorial expansions, we find precise formulas for how fast this angle goes to zero as depth increases. These formulas capture microscopic fluctuations that are not visible in the popular framework of infinite width limits, and leads to qualitatively different predictions. We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour. The formulas are given in terms of the mixed moments of correlated Gaussians passed through the ReLU function. We also find a surprising combinatorial connection between these mixed moments and the Bessel numbers that allows us to explicitly evaluate these moments.
翻译:尽管深度神经网络在各类任务中表现卓越,但其许多理论知识尚未完善。其中一项奥秘是深度退化现象:网络层数越深,初始化时网络越趋近于常函数。本文研究了ReLU神经网络中两个输入间夹角随层数增加的演化规律。通过组合展开方法,我们推导出该角度随深度增加趋近于零的精确公式。这些公式捕捉到了宽极限框架中不可见的微观波动,并产生了定性不同的预测。通过蒙特卡洛实验验证理论结果,表明我们的公式能精确逼近有限网络行为。公式以相关高斯分布通过ReLU函数的混合矩形式给出。此外,我们发现这些混合矩与贝塞尔数之间存在惊人的组合联系,从而得以显式计算这些矩。