Deep neural networks (DNNs) trained to minimize a loss term plus the sum of squared weights via gradient descent corresponds to the common approach of training with weight decay. This paper provides new insights into this common learning framework. We characterize the kinds of functions learned by training with weight decay for multi-output (vector-valued) ReLU neural networks. This extends previous characterizations that were limited to single-output (scalar-valued) networks. This characterization requires the definition of a new class of neural function spaces that we call vector-valued variation (VV) spaces. We prove that neural networks (NNs) are optimal solutions to learning problems posed over VV spaces via a novel representer theorem. This new representer theorem shows that solutions to these learning problems exist as vector-valued neural networks with widths bounded in terms of the number of training data. Next, via a novel connection to the multi-task lasso problem, we derive new and tighter bounds on the widths of homogeneous layers in DNNs. The bounds are determined by the effective dimensions of the training data embeddings in/out of the layers. This result sheds new light on the architectural requirements for DNNs. Finally, the connection to the multi-task lasso problem suggests a new approach to compressing pre-trained networks.
翻译:通过梯度下降训练深度神经网络(DNNs)以最小化损失项与参数平方和之和,对应于常用的权重衰减训练方法。本文为这一常见学习框架提供了新见解。我们刻画了多输出(向量值)ReLU神经网络在权重衰减训练下所学习的函数类型,这扩展了先前仅限于单输出(标量值)网络的研究。该刻画需要定义一类新的神经函数空间,我们称之为向量值变分(VV)空间。通过一种新的表示定理,我们证明神经网络(NNs)是定义在VV空间上的学习问题的最优解。该新表示定理表明,这些学习问题的解以向量值神经网络的形式存在,其宽度受限于训练数据数量。进一步地,通过建立与多任务lasso问题的新关联,我们推导出深度神经网络中同质层宽度的更紧致界。这些界由训练数据嵌入在该层输入/输出空间中的有效维度决定,该结果为深度神经网络的架构要求提供了新启示。最后,与多任务lasso问题的关联还提出了一种压缩预训练网络的新方法。