We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs), when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in \citet{safran2019depth}, and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.
翻译:我们证明了在深度2与深度3神经网络(具有实值输入)之间存在指数级规模分离,当以常数精度逼近$\mathcal{O}(1)$-利普希茨目标函数时,该函数定义于支撑集在单位球内的分布上,且满足深度2网络权重呈指数有界的温和假设。这一结果解决了\citet{safran2019depth}中提出的开放性问题,并证明维度灾难在深度2逼近中显现,即使目标函数可以通过深度3网络高效表示。此前,用于分离深度2与深度3网络的下界要求利普希茨常数、目标精度或逼近域尺寸(的某种度量)中至少有一个随输入维度呈多项式增长,而在我们的结果中这些参数被固定为与输入维度无关的常数:我们的参数同时达到最优。我们的下界适用于多种激活函数,其基于一种新颖的最坏情况到平均情况随机自归约论证的应用,使得我们能够在新的领域利用深度2阈值电路下界。