Overparameterized neural networks often show a benign overfitting property in the sense of achieving excellent generalization behavior despite the number of parameters exceeding the number of training examples. A promising direction to explain benign overfitting is to relate generalization to the norm of distance from initialization, motivated by the empirical observations that this distance is often significantly smaller than the norm itself. However, the existing initialization-dependent complexity analyses measure the distance from initialization by the Frobenius norm, and often imply vacuous bounds in practice for overparamterized models. In this paper, we develop initialization-dependent complexity bounds for shallow neural networks with general Lipschitz activation functions. Our bounds depend on the path-norm of the distance from initialization, which are derived by introducing a new peeling technique to handle the challenge along with the initialization-dependent constraint. We also develop a lower bound tight up to a constant factor. Finally, we conduct empirical comparisons and show that our generalization analysis implies non-vacuous bounds for overparameterized networks.
翻译:过参数化神经网络常表现出良性过拟合性质,即尽管参数数量超过训练样本数,但仍能达到优异的泛化性能。解释良性过拟合的一个有前景方向是将泛化与初始化距离的范数相关联,其动机源于经验观察:该距离通常远小于范数本身。然而,现有的初始化依赖复杂度分析采用Frobenius范数度量初始化距离,且对过参数化模型常导致实践中非紧的界。本文针对具有一般Lipschitz激活函数的浅层神经网络,提出了初始化依赖的复杂度界。该界依赖于初始化距离的路径范数,通过引入新的剥离技术处理初始化依赖约束带来的挑战而推导得出。同时,我们构建了紧至常数因子的下界。最后,通过实证比较表明,本文的泛化分析为过参数化网络提供了非平凡的界。