Unraveling the emergence of collective learning in systems of coupled artificial neural networks points to broader implications for machine learning, neuroscience, and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive an effective theory for linear networks to show that the coarse-grained behavior of our system is equivalent to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts depth-dependent disorder-order-disorder phase transitions in the parameters' solutions that reveal a depth-delayed onset of a collective learning phase and a low-rank microscopic learning path. We validate the theory in coupled ensembles of realistic neural networks trained on the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained on private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work establishes the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.
翻译:揭示耦合人工神经网络系统中集体学习的涌现机制,对机器学习、神经科学乃至社会领域具有深远意义。本文提出一个最小化模型,通过考虑局部学习动力学(每个神经网络单元的参数更新)与扩散耦合(促进整体参数趋于均匀化的单元间相互作用)两项的竞争,凝练了多种近期提出的分布式算法。我们推导了线性网络的有效理论,证明系统的粗粒度行为等价于含淬火无序的变形式金兹堡-朗道模型。该理论框架预言了参数解中依赖深度的无序-有序-无序相变,揭示了集体学习阶段深度延迟启动现象及低秩微观学习路径。在隐私约束下基于MNIST数据集训练的实际神经网络耦合系统中,该理论得到验证。实验结果表明,当集体学习阶段涌现时,基于私有数据训练的独立网络能够完全泛化至未见数据类别。本研究确立了集体学习物理学基础,并为分布式场景下深度学习的机制可解释性作出贡献。