We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.
翻译:我们证明,在一维数据上训练神经网络等价于求解一个凸Lasso问题,该问题具有一个固定、显式定义的字典矩阵特征。具体的字典取决于激活函数和网络深度。我们考虑具有分段线性激活函数的双层网络、最多4层的深度窄ReLU网络,以及具有符号激活函数和任意深度的矩形与树状网络。有趣的是,在ReLU网络中,第四层创建的特征表示训练数据关于自身的反射。Lasso表示为全局最优网络及其解空间提供了深刻见解。