Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.
翻译:理解神经网络架构和基于梯度的优化方法所施加的隐式正则化,是深度学习与人工智能领域的关键挑战。本文针对过参数化回归场景下对角线性网络(DLN)梯度流所施加的隐式正则化,提供了精确结果,并(可能出人意料地)将其与广义近似难度(GHA)中的相变现象联系起来。GHA将计算机科学中的近似难度现象推广至连续优化与鲁棒优化等领域。已知具有极小初始化的DLN梯度流的$\ell^1$范数收敛于基追踪的目标函数。本文改进了这些结果,证明具有极小初始化的DLN梯度流逼近基追踪优化问题的最小化器(而不仅是目标函数),并针对初始化规模获得了新的精确收敛界。若我们的结果不具精确性,则意味着基追踪优化问题不会出现GHA现象——矛盾由此产生,从而反证了结果的精确性。此外,当最小化器不唯一时,我们刻画了梯度流会选择基追踪问题中$\textit{哪个}$$\ell_1$最小化器。有趣的是,这取决于DLN的深度。