We design and analyze a new paradigm for building supervised learning networks, driven only by local optimization rules without relying on a global error function. Traditional neural networks with a fixed topology are made up of identical nodes and derive their expressiveness from an appropriate adjustment of connection weights. In contrast, our network stores new knowledge in the nodes accurately and instantaneously, in the form of a lookup table. Only then is some of this information structured and incorporated into the network geometry. The training error is initially zero by construction and remains so throughout the network topology transformation phase. The latter involves a small number of local topological transformations, such as splitting or merging of nodes and adding binary connections between them. The choice of operations to be carried out is only driven by optimization of expressivity at the local scale. What we are primarily looking for in a learning network is its ability to generalize, i.e. its capacity to correctly answer questions for which it has never learned the answers. We show on numerous examples of classification tasks that the networks generated by our algorithm systematically reach such a state of perfect generalization when the number of learned examples becomes sufficiently large. We report on the dynamics of the change of state and show that it is abrupt and has the distinctive characteristics of a first order phase transition, a phenomenon already observed for traditional learning networks and known as grokking. In addition to proposing a non-potential approach for the construction of learning networks, our algorithm makes it possible to rethink the grokking transition in a new light, under which acquisition of training data and topological structuring of data are completely decoupled phenomena.
翻译:我们设计并分析了一种构建监督学习网络的新范式,该范式仅由局部优化规则驱动,无需依赖全局误差函数。传统具有固定拓扑结构的神经网络由相同节点组成,其表达能力来源于连接权重的适当调整。相比之下,我们的网络以查找表的形式,准确且即时地将新知识存储于节点中。随后,部分信息被结构化并融入网络几何构型。训练误差在构造初期即为零,并在整个网络拓扑转换阶段始终保持为零。拓扑转换过程涉及少量局部拓扑变换,例如节点的分裂或合并,以及节点间二元连接的添加。所执行操作的选择仅由局部尺度的表达能力优化驱动。我们在学习网络中主要寻求的是其泛化能力,即对从未学习过答案的问题给出正确回答的能力。我们在大量分类任务示例中证明,当学习样本数量足够大时,由我们算法生成的网络系统性地达到这种完美泛化状态。我们报告了状态变化的动力学过程,表明这种变化是突发的,并具有一级相变的典型特征——这种现象在传统学习网络中已被观察到,称为顿悟现象。除了提出一种非势能方法构建学习网络外,我们的算法使得能够以全新视角重新审视顿悟转变,在此视角下训练数据的获取与数据的拓扑结构化是完全解耦的现象。