Deep neural networks have achieved remarkable success in diverse applications, prompting the need for a solid theoretical foundation. Recent research has identified the minimal width $\max\{2,d_x,d_y\}$ required for neural networks with input dimensions $d_x$ and output dimension $d_y$ that use leaky ReLU activations to universally approximate $L^p(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$ on compacta. Here, we present an alternative proof for the minimal width of such neural networks, by directly constructing approximating networks using a coding scheme that leverages the properties of leaky ReLUs and standard $L^p$ results. The obtained construction has a minimal interior dimension of $1$, independent of input and output dimensions, which allows us to show that autoencoders with leaky ReLU activations are universal approximators of $L^p$ functions. Furthermore, we demonstrate that the normalizing flow LU-Net serves as a distributional universal approximator. We broaden our results to show that smooth invertible neural networks can approximate $L^p(\mathbb{R}^{d},\mathbb{R}^{d})$ on compacta when the dimension $d\geq 2$, which provides a constructive proof of a classical theorem of Brenier and Gangbo. In addition, we use a topological argument to establish that for FNNs with monotone Lipschitz continuous activations, $d_x+1$ is a lower bound on the minimal width required for the uniform universal approximation of continuous functions $C^0(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$ on compacta when $d_x\geq d_y$.
翻译:深度神经网络在多样化应用中取得了显著成功,这促使我们对其建立坚实的理论基础。最近的研究已确定使用泄漏ReLU激活函数的神经网络在紧集上通用逼近$L^p(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$所需的最小宽度为$\max\{2,d_x,d_y\}$,其中$d_x$为输入维度,$d_y$为输出维度。本文通过直接构造逼近网络,提出了一种替代性证明方法:利用泄漏ReLU的性质和标准$L^p$结果,采用编码方案构建网络。所得构造具有与输入输出维度无关的最小内部维度$1$,这使我们能够证明采用泄漏ReLU激活函数的自编码器是$L^p$函数的通用逼近器。此外,我们证明了归一化流LU-Net可作为分布通用逼近器。我们将结果推广至光滑可逆神经网络,证明当维度$d\geq 2$时,此类网络可在紧集上逼近$L^p(\mathbb{R}^{d},\mathbb{R}^{d})$,这为Brenier和Gangbo的经典定理提供了构造性证明。另外,我们通过拓扑论证确立:对于具有单调Lipschitz连续激活函数的前馈神经网络,当$d_x\geq d_y$时,在紧集上一致通用逼近连续函数$C^0(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$所需的最小宽度下界为$d_x+1$。