This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large number of epochs, searching for the hyper-parameters that lead to it is time-consuming. In this paper, we propose a low-cost method to predict grokking without training for a large number of epochs. In essence, by studying the learning curve of the first few epochs, we show that one can predict whether grokking will occur later on. Specifically, if certain oscillations occur in the early epochs, one can expect grokking to occur if the model is trained for a much longer period of time. We propose using the spectral signature of a learning curve derived by applying the Fourier transform to quantify the amplitude of low-frequency components to detect the presence of such oscillations. We also present additional experiments aimed at explaining the cause of these oscillations and characterizing the loss landscape.
翻译:本文聚焦于预测神经网络中“顿悟”现象的发生,该现象是指在观察到过拟合或记忆化迹象很长一段时间后,才出现完美泛化。已有报告指出,“顿悟”仅在特定的超参数设置下才能被观测到。这使得识别导致“顿悟”的参数变得至关重要。然而,由于“顿悟”发生在大量训练周期之后,搜索导致该现象的超参数非常耗时。本文提出了一种低成本方法,无需进行大量周期训练即可预测“顿悟”。本质上,通过研究最初几个周期的学习曲线,我们证明可以预测后续是否会出现“顿悟”。具体而言,如果在早期周期中出现特定振荡,则可以预期当模型经过更长时间训练后会发生“顿悟”。我们提出利用傅里叶变换导出的学习曲线谱特征,量化低频分量的幅度,以检测此类振荡的存在。我们还展示了旨在解释这些振荡成因及刻画损失景观的补充实验。