Covering numbers of (deep) ReLU networks have been used to characterize approximation-theoretic performance, to upper-bound prediction error in nonparametric regression, and to quantify classification capacity. These results rely on covering number upper bounds obtained via explicit constructions of coverings. Lower bounds on covering numbers do not appear to be available in the literature. The present paper fills this gap by deriving tight (up to multiplicative constants) lower and upper bounds on the metric entropy (i.e., the logarithm of the covering numbers) of fully connected networks with bounded weights, sparse networks with bounded weights, and fully connected networks with quantized weights. The tightness of these bounds yields a fundamental understanding of the impact of sparsity, quantization, bounded versus unbounded weights, and network output truncation. Moreover, the bounds allow one to characterize fundamental limits of neural network transformation, including network compression, and lead to sharp upper bounds on the prediction error in nonparametric regression through deep networks. In particular, we remove a $\log^6(n)$-factor from the best known sample complexity rate for estimating Lipschitz functions via deep networks, thereby establishing optimality. Finally, we identify a systematic relation between optimal nonparametric regression and optimal approximation through deep networks, unifying numerous results in the literature and revealing underlying general principles.
翻译:深度ReLU网络的覆盖数已被用于刻画逼近理论性能、为非参数回归中的预测误差提供上界,以及量化分类容量。这些结果依赖于通过显式构造覆盖集获得的覆盖数上界。覆盖数的下界在现有文献中似乎尚未得到研究。本文通过推导具有有界权重的全连接网络、具有有界权重的稀疏网络,以及具有量化权重的全连接网络的度量熵(即覆盖数的对数)的紧致(相差常数倍)上下界,填补了这一空白。这些界限的紧致性揭示了稀疏性、量化、有界与无界权重以及网络输出截断影响的基本机理。此外,这些界限使得我们能够刻画神经网络变换的基本极限(包括网络压缩),并通过深度网络推导出非参数回归中预测误差的尖锐上界。特别地,我们移除了通过深度网络估计Lipschitz函数时已知最佳样本复杂度率中的$\log^6(n)$因子,从而确立了最优性。最后,我们揭示了最优非参数回归与通过深度网络实现最优逼近之间的系统关联,统一了文献中的众多结果,并阐明了其背后的通用原理。