The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.
翻译:神经网络在训练过程中的动态演化一直是一个极其引人入胜的研究课题。从第一性原理推导统计物理系统中变量的普适演化方程,已被证明在概念上描述训练动力学时非常有用,这在实践中意味着数值求解诸如福克-普朗克方程等方程。然而,模拟整个网络不可避免地会遭遇维度灾难。在本文中,我们利用福克-普朗克方程来模拟一个简单的双瓶颈层自编码器中瓶颈层内单个权重矩阵的概率密度演化,并通过检查输出数据分布,将理论演化与经验演化进行比较。此外,我们还从所得到的动力学方程中推导出具有物理意义的部分微分方程,例如卡伦-西曼齐克方程和卡达尔-帕里西-张方程。