Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.
翻译:深度生成模型因其学习和采样复杂分布的能力而变得无处不在。尽管各种框架不断涌现,但这些模型之间的关系在很大程度上仍未得到探索,这一空白阻碍了统一人工智能学习理论的发展。我们致力于解决两个核心挑战:阐明不同深度生成模型之间的联系,并深化对其学习机制的理解。我们聚焦于受限玻尔兹曼机,该模型以其对离散分布的通用逼近能力而闻名。通过引入倒易空间表述,我们揭示了RBMs、扩散过程与耦合玻色子之间的联系。我们证明,在初始化阶段,RBM运行于一个鞍点,其局部曲率由奇异值决定,这些奇异值的分布遵循Marcenko-Pastur定律并展现出旋转对称性。在训练过程中,由于分层学习——即不同自由度逐步捕捉多个抽象层次的特征——这种旋转对称性被打破。这导致了能量景观中的对称性破缺,令人联想到朗道理论。能量景观中的这种对称性破缺由奇异值及权重矩阵特征向量矩阵所表征。我们在平均场近似下推导了相应的自由能。我们证明,在无限尺寸RBM的极限下,倒易变量呈高斯分布。我们的研究结果表明,在此状态下,存在某些模式,其扩散过程将不会收敛到玻尔兹曼分布。为阐明我们的结果,我们使用MNIST数据集训练了具有不同隐藏层尺寸的RBM副本。我们的发现弥合了不同生成框架之间的鸿沟,同时也揭示了生成模型中支撑学习过程的机制。