Neural networks are powerful functions with widespread use, but the theoretical behaviour of these functions is not fully understood. Creating deep neural networks by stacking many layers has achieved exceptional performance in many applications and contributed to the recent explosion of these methods. Previous works have shown that depth can exponentially increase the expressibility of the network. However, as networks get deeper and deeper, they are more susceptible to becoming degenerate. We observe this degeneracy in the sense that on initialization, inputs tend to become more and more correlated as they travel through the layers of the network. If a network has too many layers, it tends to approximate a (random) constant function, making it effectively incapable of distinguishing between inputs. This seems to affect the training of the network and cause it to perform poorly, as we empirically investigate in this paper. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture, and demonstrate how the predicted degeneracy relates to training dynamics of the network. We also compare this prediction to predictions derived using infinite width networks.
翻译:神经网络是具有广泛用途的强大函数,但其理论行为尚未完全被理解。通过堆叠多层来构建深度神经网络,已在众多应用中取得了卓越性能,并推动了这些方法的近期蓬勃发展。已有研究表明,深度可以指数级提升网络的表达能力。然而,随着网络不断加深,它们更容易陷入退化状态。我们观察到这种退化现象:在初始化阶段,输入在网络层间传递时趋于变得更加相关。若网络层数过多,它将近似于一个(随机)常数函数,从而实际上无法区分不同输入。正如本文通过实证研究所揭示的,这似乎会影响网络训练并导致性能下降。我们采用一种简单算法,能够准确预测任意给定全连接ReLU网络架构的退化程度,并展示预测的退化如何与网络训练动态相关联。同时,我们将该预测与基于无限宽度网络推导的预测进行了比较。