The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that architectures used in practice do not exhibit behaviors as predicted by the NTK. Here, we supplement previous work on the NTK by empirically investigating whether the limiting regime predicts practically relevant behavior of large-width architectures. Our results demonstrate that this is not the case across multiple domains. This observed disconnect between theory and practice further calls into question to what degree NTK theory should inform architectural and algorithmic choices.
翻译:神经正切核(NTK)作为描述大规模神经网络行为的理论框架已获得广泛关注。核方法在理论上已被充分理解,因而具备算法优势,这些优势在宽泛的合成神经网络架构中得以验证,包括更快的优化速度、可靠的不确定性量化以及改进的持续学习能力。然而,当前量化收敛至核机制速率的研究结果表明,要利用这些优势需要架构的宽度比深度大数个数量级。这一假设引发了对实际应用架构是否呈现NTK预测行为的担忧。本文通过实证研究极限机制是否能够预测大宽度架构的实际相关行为,对先前NTK研究进行了补充。我们的研究结果表明,在多个领域中这一预测并不成立。这种理论与实践的脱节进一步质疑了NTK理论应在何种程度上指导架构与算法的选择。