We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
翻译:我们研究了一类具有多变量非线性/激活函数的神经架构的变分最优性(具体而言,即Banach空间最优性)。为此,我们通过正则化算子和k-平面变换构造了一族新的Banach空间。我们证明了一个表示定理:在这些Banach空间上定义的机器学习问题的解集完全由具有多变量非线性的神经架构刻画。这些最优架构包含跳跃连接,并与正交权重归一化和多指标模型紧密相关,这两者在神经网络社区中均受到广泛关注。我们的框架兼容多种经典非线性函数,包括整流线性单元(ReLU)激活函数、范数激活函数以及薄板/多调和样条理论中的径向基函数。我们还证明了这些底层空间是再生核Banach空间与变分空间的特殊实例。我们的结果揭示了通过数据训练的神经网络所学函数的正则性(特别是采用多变量非线性时),并为实践中发现的多种架构选择提供了新的理论依据。