A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
翻译:深度神经网络(DNN)的关键特性之一是在训练过程中学习新特征的能力。这一深度学习的引人入胜之处在最近报道的“顿悟”(Grokking)现象中最为突出。虽然该现象主要体现为测试准确率的突然提升,但也被认为是超越惰性学习/高斯过程(GP)的现象,其涉及特征学习。本文我们将特征学习理论的最新进展——自适应核方法——应用于两个包含三次多项式与模加法教师的师生模型。我们提供了关于这些模型特征学习与顿悟特性的解析预测,并论证了顿悟现象与相变理论之间的映射关系。研究表明,在顿悟发生后,DNN的状态类似于一级相变后的混合相。在这一混合相中,DNN产生了与教师模型本质特征相对应的有效内部表征,这些表征在相变前后呈现出明显差异。