We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.
翻译:本文引入一类全连接神经网络,其激活函数并非逐点作用,而是将特征向量按照仅依赖于其范数的函数进行缩放。我们将此类网络称为径向神经网络,推广了先前关于旋转等变网络的工作——后者以较受限的通用性考虑了激活函数的缩放。我们证明了径向神经网络的通用逼近定理,涵盖有界宽度与无界域等更困难的情形。我们的证明方法新颖,与逐点情形下的方法截然不同。此外,径向神经网络在可训练参数构成的向量空间上展现出丰富的正交基变换对称群。消除这些对称性可导出一种实用的无损模型压缩算法。通过梯度下降对压缩模型进行优化等价于对完整模型执行投影梯度下降。