We present a theoretical approach to overcome the curse of dimensionality using a neural computation algorithm which can be distributed across several machines. Our modular distributed deep learning paradigm, termed \textit{neural pathways}, can achieve arbitrary accuracy while only loading a small number of parameters into GPU VRAM. Formally, we prove that for every error level $\varepsilon>0$ and every Lipschitz function $f:[0,1]^n\to \mathbb{R}$, one can construct a neural pathways model which uniformly approximates $f$ to $\varepsilon$ accuracy over $[0,1]^n$ while only requiring networks of $\mathcal{O}(\varepsilon^{-1})$ parameters to be loaded in memory and $\mathcal{O}(\varepsilon^{-1}\log(\varepsilon^{-1}))$ to be loaded during the forward pass. This improves the optimal bounds for traditional non-distributed deep learning models, namely ReLU MLPs, which need $\mathcal{O}(\varepsilon^{-n/2})$ parameters to achieve the same accuracy. The only other available deep learning model that breaks the curse of dimensionality is MLPs with super-expressive activation functions. However, we demonstrate that these models have an infinite VC dimension, even with bounded depth and width restrictions, unlike the neural pathways model. This implies that only the latter generalizes. Our analysis is validated experimentally in both regression and classification tasks, demonstrating that our model exhibits superior performance compared to larger centralized benchmarks.
翻译:我们提出了一种理论方法,通过可在多台机器上分布式部署的神经计算算法来克服维度灾难。我们提出的模块化分布式深度学习范式——称为"神经通路"——能够在仅加载少量参数至GPU显存的条件下实现任意精度。形式上,我们证明:对任意误差水平$\varepsilon>0$和任意Lipschitz函数$f:[0,1]^n\to \mathbb{R}$,可构造一个神经通路模型在$[0,1]^n$上以$\varepsilon$精度一致逼近$f$,同时仅需在内存中加载$\mathcal{O}(\varepsilon^{-1})$个参数的网络,并在前向传播过程中加载$\mathcal{O}(\varepsilon^{-1}\log(\varepsilon^{-1}))$个参数。这超越了传统非分布式深度学习模型(即ReLU多层感知机)的最优界——后者需$\mathcal{O}(\varepsilon^{-n/2})$个参数才能达到同等精度。目前唯一能打破维度灾难的其他深度学习模型是采用超强表达激活函数的多层感知机。然而我们证明,与神经通路模型不同,即使施加深度和宽度限制,这些模型的VC维仍为无穷大,这意味着只有后者具有泛化能力。我们的分析在回归与分类任务中均得到实验验证,表明该模型相较于更大规模的集中式基准方法具有更优性能。