A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
翻译:深度神经网络(DNN)的一个关键特性是其在训练过程中学习新特征的能力。深度学习这一引人入胜的方面在最近报道的「顿悟」现象中最为突出。尽管这主要体现为测试准确率的突然提升,但「顿悟」也被认为是一种涉及特征学习的超越惰性学习/高斯过程现象。在此,我们将特征学习理论的最新发展——自适应核方法——应用于两个以三次多项式和模加法为教师的师生模型中。我们提供了关于这些模型的特征学习与「顿悟」特性的分析预测,并展示了「顿悟」与相变理论之间的映射关系。我们证明,在「顿悟」之后,深度神经网络的状态类似于一阶相变后的混合相。在这种混合相中,深度神经网络生成了与相变前截然不同的、关于教师的有用内部表示。