We study the optimization landscape of deep linear neural networks with the square loss. It is known that, under weak assumptions, there are no spurious local minima and no local maxima. However, the existence and diversity of non-strict saddle points, which can play a role in first-order algorithms' dynamics, have only been lightly studied. We go a step further with a full analysis of the optimization landscape at order 2. We characterize, among all critical points, which are global minimizers, strict saddle points, and non-strict saddle points. We enumerate all the associated critical values. The characterization is simple, involves conditions on the ranks of partial matrix products, and sheds some light on global convergence or implicit regularization that have been proved or observed when optimizing linear neural networks. In passing, we provide an explicit parameterization of the set of all global minimizers and exhibit large sets of strict and non-strict saddle points.
翻译:我们研究了采用平方损失函数的深度线性神经网络的优化景观。已知在弱假设条件下,该网络不存在伪局部极小值,也不存在局部极大值。然而,非严格鞍点的存在性与多样性(这些点可能在一阶算法的动态中发挥作用)目前仅得到初步研究。我们通过二阶完整分析进一步深入探讨了优化景观。我们在所有临界点中,刻画了哪些是全局最小化点、严格鞍点以及非严格鞍点。我们枚举了所有相关的临界值。该刻画方式简洁明了,涉及部分矩阵乘积的秩条件,并对优化线性神经网络时已证明或观察到的全局收敛性或隐式正则化现象提供了新的见解。此外,我们显式参数化了所有全局最小化点的集合,并展示了大量严格与非严格鞍点的集合。