The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, we provide a general method for bounding the complexity required for a neural network to approximate a H\"older (or uniformly) continuous function defined on a high-dimensional set with a low-complexity structure. The approach is based on the observation that the existence of a Johnson-Lindenstrauss embedding $A\in\mathbb{R}^{d\times D}$ of a given high-dimensional set $S\subset\mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any H\"older (or uniformly) continuous function $f:S\to\mathbb{R}^p$, there exists a H\"older (or uniformly) continuous function $g:[-M,M]^d\to\mathbb{R}^p$ such that $g(Ax)=f(x)$ for all $x\in S$. Hence, if one has a neural network which approximates $g:[-M,M]^d\to\mathbb{R}^p$, then a layer can be added that implements the JL embedding $A$ to obtain a neural network that approximates $f:S\to\mathbb{R}^p$. By pairing JL embedding results along with results on approximation of H\"older (or uniformly) continuous functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate H\"older (or uniformly) continuous functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.
翻译:神经网络在各类逆问题中的显著成功推动了过去十年间从医学成像到地震分析等学科对其的广泛采用。然而,此类逆问题的高维特性同时导致现有理论——该理论预测网络规模应随问题维度呈指数增长——无法解释实践中这些场景下看似较小的网络为何能表现出色。为缩小理论与实践的差距,本文提供了一种通用方法,用于界定神经网络逼近定义在具有低复杂度结构的高维集合上的赫尔德(或一致)连续函数所需的复杂度。该方法基于以下观察:若存在约翰逊-林登斯特劳斯嵌入 $A\in\mathbb{R}^{d\times D}$ 将给定高维集合 $S\subset\mathbb{R}^D$ 映射到低维立方体 $[-M,M]^d$,则对于任何赫尔德(或一致)连续函数 $f:S\to\mathbb{R}^p$,总存在赫尔德(或一致)连续函数 $g:[-M,M]^d\to\mathbb{R}^p$ 使得对所有 $x\in S$ 有 $g(Ax)=f(x)$。因此,若已有神经网络可逼近 $g:[-M,M]^d\to\mathbb{R}^p$,则可通过添加实现JL嵌入 $A$ 的层,获得逼近 $f:S\to\mathbb{R}^p$ 的神经网络。将JL嵌入结果与神经网络逼近赫尔德(或一致)连续函数的结果相结合,即可得到界定神经网络逼近高维集合上赫尔德(或一致)连续函数所需复杂度的结论。最终,该通用理论框架能够更好地解释现有理论允许范围之外更广泛逆问题中观测到的较小神经网络经验成功性。