Recently, there has been a growing focus on determining the minimum width requirements for achieving the universal approximation property in deep, narrow Multi-Layer Perceptrons (MLPs). Among these challenges, one particularly challenging task is approximating a continuous function under the uniform norm, as indicated by the significant disparity between its lower and upper bounds. To address this problem, we propose a framework that simplifies finding the minimum width for deep, narrow MLPs into determining a purely geometrical function denoted as $w(d_x, d_y)$. This function relies solely on the input and output dimensions, represented as $d_x$ and $d_y$, respectively. Two key steps support this framework. First, we demonstrate that deep, narrow MLPs, when provided with a small additional width, can approximate a $C^2$-diffeomorphism. Subsequently, using this result, we prove that $w(d_x, d_y)$ equates to the optimal minimum width required for deep, narrow MLPs to achieve universality. By employing the aforementioned framework and the Whitney embedding theorem, we provide an upper bound for the minimum width, given by $\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$, where $0 \leq \alpha(\sigma) \leq 2$ represents a constant depending on the activation function. Furthermore, we provide a lower bound of $4$ for the minimum width in cases where the input and output dimensions are both equal to two.
翻译:最近,关于深度窄多层感知机(MLP)实现通用逼近性质所需的最小宽度问题引起了广泛关注。其中,在一致范数下逼近连续函数这一任务尤为困难,这从其上下界之间的显著差距可见一斑。为解决此问题,我们提出一个框架,将寻找深度窄MLP最小宽度的问题简化为确定一个纯几何函数$w(d_x, d_y)$,该函数仅依赖于输入维度和输出维度,分别记为$d_x$和$d_y$。该框架由两个关键步骤支撑:首先,我们证明了当给予少量额外宽度时,深度窄MLP能够逼近$C^2$-微分同胚;其次,利用这一结果,我们证明了$w(d_x, d_y)$等于深度窄MLP实现通用性所需的最优最小宽度。通过上述框架与Whitney嵌入定理,我们给出了最小宽度的上界:$\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$,其中$0 \leq \alpha(\sigma) \leq 2$是依赖于激活函数的常数。此外,在输入和输出维度均为2的情况下,我们给出了最小宽度下界为4。