Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.
翻译:经验神经正切核(eNTKs)能够有效理解给定网络的表征:与无限宽度的NTK相比,其计算成本通常更低且适用性更广。然而,对于具有O个输出单元(例如O类分类器)的网络,基于N个输入的eNTK矩阵尺寸为$NO \times NO$,需要$O((NO)^2)$的内存和高达$O((NO)^3)$的计算量。因此,现有大多数应用采用少数几种能生成$N \times N$核矩阵的近似方法,从而节省数个数量级的计算量,但这些方法几乎缺乏理论依据。我们证明其中一种被称为"logits求和"的近似方法,对于任意具有宽最终"读出"层的网络,在初始化时能收敛至真实eNTK。实验表明,该近似方法在多种设置下的各类应用中均表现优异。