We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function, significantly improving upon the depth requirements of the best previously known bounds for networks with linearly-bounded width. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.
翻译:我们研究在连续分布下,使用ReLU激活函数的神经网络以$L_2$范数近似$d$输入最大值函数所需网络规模。针对不同深度网络,我们提出了近似所需宽度(神经元数量)的新下界与上界。研究结果建立了深度2与深度3、深度3与深度5网络间新的深度分离现象,并给出了深度为$\mathcal{O}(\log(\log(d)))$、宽度为$\mathcal{O}(d)$的构造方案来近似最大值函数,显著优于此前已知的最佳宽度线性有界网络深度需求。这些深度分离结果得益于深度2网络在均匀分布下近似最大值函数的新下界(假设权重规模呈指数上界)。此外,我们利用该深度2下界给出了深度3网络近似最大值所需神经元数量的紧界。我们的下界具有潜在广泛意义,因其直接应用于被广泛研究与使用的\textit{max}函数,而非如许多先前研究基于特殊构造或病态函数与分布的界限。