Despite Deep Learning's (DL) empirical success, our theoretical understanding of its efficacy remains limited. One notable paradox is that while conventional wisdom discourages perfect data fitting, deep neural networks are designed to do just that, yet they generalize effectively. This study focuses on exploring this phenomenon attributed to the implicit bias at play. Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters. In this work, we focus on investigating the implicit bias originating from weight initialization. To this end, we examine the problem of solving underdetermined linear systems in various contexts, scrutinizing the impact of initialization on the implicit regularization when using deep networks to solve such systems. Our findings elucidate the role of initialization in the optimization and generalization paradoxes, contributing to a more comprehensive understanding of DL's performance characteristics.
翻译:尽管深度学习在实证上取得了成功,但其理论上的有效性仍未得到充分理解。一个显著的悖论是,传统观点不提倡完美拟合数据,而深度神经网络恰恰如此设计,却能实现有效泛化。本研究聚焦于探索这一现象,将其归因于运行中的隐式偏见。目前已识别出多种隐式偏见来源,例如步长、权重初始化、优化算法以及参数数量。本文重点研究源于权重初始化的隐式偏见。为此,我们考察了在不同背景下求解欠定线性系统的问题,细致分析了使用深度网络解此类系统时,初始化对隐式正则化的影响。我们的研究发现阐明了初始化在优化与泛化悖论中的作用,从而有助于更全面地理解深度学习的性能特征。