Despite remarkable performance on a variety of tasks, many properties of deep neural networks are not yet theoretically understood. One such mystery is the depth degeneracy phenomenon: the deeper you make your network, the closer your network is to a constant function on initialization. In this paper, we examine the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers. By using combinatorial expansions, we find precise formulas for how fast this angle goes to zero as depth increases. These formulas capture microscopic fluctuations that are not visible in the popular framework of infinite width limits, and leads to qualitatively different predictions. We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour. \review{We also empirically investigate how the depth degeneracy phenomenon can negatively impact training of real networks.} The formulas are given in terms of the mixed moments of correlated Gaussians passed through the ReLU function. We also find a surprising combinatorial connection between these mixed moments and the Bessel numbers that allows us to explicitly evaluate these moments.
翻译:尽管深度神经网络在各种任务中表现出卓越的性能,其许多特性尚未在理论上得到充分理解。其中一个未解之谜是深度退化现象:网络越深,其在初始化时越接近于一个常数函数。本文研究了ReLU神经网络中两个输入向量之间夹角随网络层数增加的演化过程。通过使用组合展开方法,我们推导出该夹角随深度增加而趋近于零的精确速率公式。这些公式捕捉了在流行的无限宽度极限框架下不可见的微观波动,并给出了性质不同的预测。我们通过蒙特卡洛实验验证了理论结果,并证明我们的结果能准确近似有限网络的行为。这些公式通过相关高斯变量经ReLU函数后的混合矩表示。我们还发现这些混合矩与贝尔数之间存在令人惊讶的组合关联,这使得我们能够显式计算这些矩值。