This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding linear layers to a ReLU network yields a representation cost that favors functions that can be approximated by a low-rank linear operator composed with a function with low representation cost using a two-layer network. Specifically, using a neural network to fit training data with minimum representation cost yields an interpolating function that is nearly constant in directions orthogonal to a low-dimensional subspace. This means that the learned network will approximately be a single- or multiple-index model. Our experiments show that when this active subspace structure exists in the data, adding linear layers can improve generalization and result in a network that is well-aligned with the true active subspace.
翻译:本文探讨深度大于两层的过参数化神经网络的隐式偏置。我们的框架综合考虑了具有相同容量但不同隐式定义表示代价的一系列不同深度的网络。由神经网络架构诱导的函数的表示代价,是指网络表示该函数所需的最小平方权值和;它反映了与该架构相关联的函数空间偏置。我们的结果表明,向ReLU网络添加线性层会产生一种表示代价,该代价倾向于支持可通过低秩线性算子与使用两层网络具有低表示代价的函数复合后近似得到的函数。具体而言,使用最小化表示代价的神经网络拟合训练数据,会得到一个在低维子空间正交方向上近乎恒定的插值函数。这意味着所学网络将近似成为单索引或多索引模型。我们的实验表明,当数据中存在这种活跃子空间结构时,添加线性层可以提升泛化性能,并使网络与真实活跃子空间保持良好对齐。