In this paper, we explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control. Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation by asking the activation function, or one of its derivatives, to satisfy a quadratic differential equation. Many activation functions used in practice satisfy this assumption, exactly or approximately, and we show this property to be sufficient for an adequately deep neural network with $n+1$ neurons per layer to approximate arbitrarily well, on a compact set and with respect to the supremum norm, any continuous function from $\mathbb{R}^n$ to $\mathbb{R}^n$. We further show this result to hold for very simple architectures for which the weights only need to assume two values. The first key technical contribution consists of relating the universal approximation problem to controllability of an ensemble of control systems corresponding to a residual network and to leverage classical Lie algebraic techniques to characterize controllability. The second technical contribution is to identify monotonicity as the bridge between controllability of finite ensembles and uniform approximability on compact sets.
翻译:本文通过几何非线性控制解释了深度残差神经网络的万能逼近能力。受近期建立残差网络与控制系统之间联系的工作启发,我们给出了残差网络具有万能逼近能力的一个一般性充分条件,即要求激活函数或其某一阶导数满足二次微分方程。实践中使用的许多激活函数(精确或近似地)满足该假设,我们证明该性质足以使得每层包含$n+1$个神经元的深度适当的神经网络在紧集上关于上确界范数任意逼近任意从$\mathbb{R}^n$到$\mathbb{R}^n$的连续函数。我们进一步证明该结论对极简架构成立,其中权重仅需取两个值。第一项关键技术贡献在于将万能逼近问题与残差网络对应的控制系统的系综能控性相关联,并利用经典李代数方法刻画能控性。第二项技术贡献是识别出单调性作为有限系综能控性与紧集上一致可逼近性之间的桥梁。