We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$ suffices for the NTK approximation to be valid until training time $T$. Our bound is tight and improves on the previous bound of Chizat et al. 2019, which required a larger rescaling factor of $\alpha = O(T^2)$.
翻译:我们研究了神经正切核(NTK)近似在平方损失下训练模型时的有效性。在Chizat等人2019提出的惰性训练框架中,我们证明了将模型缩放因子设为$\alpha = O(T)$,即可使NTK近似在训练时间$T$内保持有效。该界是紧的,且相较于Chizat等人2019先前需要更大缩放因子$\alpha = O(T^2)$的结论,取得了显著改进。