This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
翻译:本文求得了带权重衰减和随机神经元的深度线性网络全局极小点的解析表达式,该网络是理解神经网络损失景观的基础模型。我们的结果表明,原点是一个特殊点,在深度神经网络的损失景观中,高度非线性现象在此处出现。我们证明权重衰减与模型架构存在强相互作用,并在具有多于一个隐藏层的网络中产生零值处的差极小点,这与仅含一个隐藏层的网络在性质上截然不同。在实际应用中,我们的结果表明,常见的深度学习初始化方法通常不足以缓解神经网络的优化困难。