We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2 and 3-layer networks with piecewise linear activations, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in absolute value and symmetrized ReLU networks, a third layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.
翻译:我们证明了在一维数据上训练神经网络等价于求解具有固定、显式定义的特征字典矩阵的凸Lasso问题。具体字典取决于激活函数和网络深度。我们研究了具有分段线性激活函数的二至三层网络,以及具有符号激活函数和任意深度的矩形网络与树状网络。有趣的是,在绝对值激活和对称化ReLU网络中,第三层会生成表征训练数据自身镜像反射的特征。这种Lasso表示形式为理解全局最优网络及其解空间提供了新的视角。