Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.
翻译:耦合人工神经网络系统中集体学习的涌现机理研究对物理、机器学习、神经科学及社会领域具有深远意义。本文通过引入一个最小模型,将近期若干分布式算法凝聚为两项竞争机制:每个神经网络单元的局部参数学习动力学,以及驱使集成系统参数均质化的单元间扩散耦合。我们通过线性网络的有效理论推导出模型的粗粒化行为,证明其等价于具有淬火无序的变形金兹堡-朗道模型。该理论框架预测了参数解中(依赖深度的)无序-有序-无序相变,揭示了集体学习相位的涌现机制,同时发现深度增大引发的临界点延迟现象以及微观学习路径的鲁棒形态。我们利用受隐私约束的MNIST数据集训练的耦合非线性网络集成系统验证了该理论。实验有趣地证实:当集体学习相位涌现时,仅使用私有数据训练的单个网络能够完全泛化至未见过的数据类别。本文阐明了集体学习的物理本质,为分布式场景下深度学习的机理可解释性做出贡献。