We prove that training neural networks on 1-D data is equivalent to solving convex Lasso problems with discrete, explicitly defined dictionary matrices. We consider neural networks with piecewise linear activations and depths ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are equivalent to Lasso models using a discrete dictionary of ramp functions, with breakpoints corresponding to the training data points. In certain general architectures with absolute value or ReLU activations, a third layer surprisingly creates features that reflect the training data about themselves. Additional layers progressively generate reflections of these reflections. The Lasso representation provides valuable insights into the analysis of globally optimal networks, elucidating their solution landscapes and enabling closed-form solutions in certain special cases. Numerical results show that reflections also occur when optimizing standard deep networks using standard non-convex optimizers. Additionally, we demonstrate our theory with autoregressive time series models.
翻译:我们证明了在一维数据上训练神经网络等价于求解具有离散、显式定义字典矩阵的凸Lasso问题。我们考虑具有分段线性激活函数且层数从2到任意有限层的神经网络。首先证明具有分段线性激活函数的两层网络等价于使用斜坡函数离散字典的Lasso模型,其断点对应于训练数据点。在采用绝对值或ReLU激活函数的特定通用架构中,第三层会出人意料地生成关于训练数据自身的反射特征。附加层则逐步生成这些反射的递归反射。Lasso表示为分析全局最优网络提供了重要见解,阐明了其解空间景观,并在某些特殊情况下实现了闭式解。数值结果表明,使用标准非凸优化器训练常规深度网络时同样会出现反射现象。此外,我们通过自回归时间序列模型验证了该理论。