The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.
翻译:神经正切核(NTK)已成为在深度神经网络中提供记忆、优化和泛化保证的有力工具。一系列工作研究了至少有一层包含$\Omega(N)$个神经元($N$为训练样本数量)的两层和深层网络的NTK谱。此外,越来越多的证据表明,只要参数数量超过样本数量,具有次线性层宽度的深层网络也是强大的记忆器和优化器。因此,一个自然的开放问题是:在这种具有挑战性的次线性设置下,NTK是否仍能保持良好的条件数。在本文中,我们对此问题给出了肯定回答。我们的关键技术贡献是:针对具有最小可能过参数化的深层网络(参数数量约为$\Omega(N)$,因此神经元数量可少至$\Omega(\sqrt{N})$),给出了最小NTK特征值的下界。为展示NTK界的适用性,我们提供了关于梯度下降训练的记忆容量和优化保证的两个结果。