Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.
翻译:经验神经正切核(eNTK)能够有效理解给定网络的表征能力:相较于无限宽度NTK,其计算代价往往更低且适用范围更广。然而,对于具有O个输出单元的网络(如O类分类器),基于N个输入的经验神经正切核的规模为$NO \times NO$,需占用$O((NO)^2)$内存空间和高达$O((NO)^3)$的计算量。因此现有大多数应用都采用若干种能生成$N \times N$核矩阵的近似方法,这类方法虽能节省数个数量级的计算资源,但其理论依据尚不充分或完全缺失。本文证明了一种名为"logits求和"的近似方法,对于任何带有宽最终"读出"层的网络,该方法在初始化时能收敛至真实经验神经正切核。我们的实验验证了该近似方法在多种场景下的应用效果。