The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent prolonged training leads to a sharp transition from no generalization to perfect generalization. Our study comprises extensive experiments and an exploration of the research behind the mechanism of grokking. Through experiments, we gained insights into its behavior concerning the training data fraction, the model, and the optimization. The mechanism of grokking has been a subject of various viewpoints proposed by researchers, and we introduce some of these perspectives.
翻译:过参数化神经网络中的grokking现象已引起广泛关注。该现象表现为神经网络首先以零训练误差和接近随机的测试误差记忆训练集,随后经过长时间训练后,会经历从无泛化到完全泛化的急剧转变。本研究通过大量实验,深入探讨了grokking背后的机理研究。通过实验,我们深入理解了该现象在训练数据比例、模型结构和优化方法方面的行为特征。关于grokking的机理,研究者们提出了多种不同观点,本文将对部分代表性观点进行介绍。