We investigate the function-space optimality (specifically, the Banach-space optimality) of a large class of shallow neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator, the $k$-plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
翻译:我们研究了一类具有多元非线性/激活函数的浅层神经架构的函数空间最优性(具体而言,巴拿赫空间最优性)。为此,我们通过正则化算子、k-平面变换以及稀疏性促进范数构建了一个新的巴拿赫空间族。我们证明了一个表示定理,指出在这些巴拿赫空间上提出的学习问题的解集完全由具有多元非线性的神经架构所表征。这些最优架构包含跳跃连接,并与正交权重归一化和多指标模型紧密相关——这两者近年来在神经网络领域中备受关注。我们的框架兼容多种经典非线性,包括修正线性单元(ReLU)激活函数、范数激活函数以及薄板/多调和样条理论中的径向基函数。我们还证明了这些底层空间是再生核巴拿赫空间和变分空间的特殊实例。我们的研究结果揭示了神经网络在数据上训练时所学函数的正则性(尤其是使用多元非线性时),并为实践中发现的若干架构选择提供了新的理论动机。