We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in $[0,1]^{d}$, assuming exponentially bounded weights. This addresses an open problem posed in \citet{safran2019depth}, and proves that the curse of dimensionality manifests in depth 2 approximation, even in cases where the target function can be represented efficiently using depth 3. Previously, lower bounds that were used to separate depth 2 from depth 3 required that at least one of the Lipschitz parameter, target accuracy or (some measure of) the size of the domain of approximation scale polynomially with the input dimension, whereas we fix the former two and restrict our domain to the unit hypercube. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of an average- to worst-case random self-reducibility argument, to reduce the problem to threshold circuits lower bounds.
翻译:我们证明了在具有指数有界权重的假设下,深度2与深度3神经网络在逼近一个定义在$[0,1]^{d}$上的$\mathcal{O}(1)$-Lipschitz目标函数并达到常数精度时,存在指数级分离。这解决了\citet{safran2019depth}中提出的开放问题,并证明了即使在目标函数可通过深度3高效表示的情况下,深度2逼近中也会出现维度灾难。以往用于分离深度2与深度3的下界要求Lipschitz参数、目标精度或(某种度量下的)逼近定义域大小至少有一个随输入维度多项式增长,而我们固定了前两个参数并将定义域限制在单位超立方体内。我们的下界适用于多种激活函数,并基于一个新颖的平均情形到最坏情形随机自归约论证,将问题归结为阈值电路下界。