A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
翻译:深度神经网络(DNN)的一个关键特性是其能够在训练过程中学习新特征。这一深度学习引人入胜的方面在最近报道的Grokking现象中尤为突出。尽管Grokking主要体现为测试准确率的突然提升,但也被认为是一种超越懒学习/高斯过程(GP)的、涉及特征学习的现象。本文应用特征学习理论的最新进展——自适应核方法——于两种教师-学生模型(分别采用三次多项式和模加法作为教师)。我们提供了这些模型在特征学习与Grokking特性上的解析预测,并论证了Grokking与相变理论之间的映射关系。研究表明,在发生Grokking后,DNN的状态类似于一级相变后的混合相。在此混合相中,DNN生成了对教师模型有用的内部表征,这些表征与相变前的表征截然不同。