Generative Autoregressive Neural Networks (ARNN) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents a physical interpretation of the ARNNs by reformulating the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures like the residual connections and a recurrent architecture with clear physical meanings. However, the exponential growth, with system size, of the number of parameters of the hidden layers makes its direct application unfeasible. Nevertheless, its architecture's explicit formulation allows using statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performances in approximating the Boltzmann distributions of the corresponding physics model than other commonly used ARNNs architectures. The connection established between the physics of the system and the ARNN architecture provides a way to derive new neural network architectures for different interacting systems and interpret existing ones from a physical perspective.
翻译:生成式自回归神经网络(ARNN)在图像和语言生成任务中展现出卓越性能,推动生成模型在科学研究和商业应用中日益普及。本研究通过将二值成对相互作用系统的玻尔兹曼分布转化为自回归形式,对ARNN进行了物理诠释。由此导出的ARNN架构中,第一层的权重和偏置对应哈密顿量的耦合项与外场,并包含残差连接和具有清晰物理意义的循环架构等广泛使用的结构。然而,隐藏层参数数量随系统规模呈指数增长,使其直接应用不可行。尽管如此,该架构的显式表达式允许运用统计物理技术为特定系统推导新型ARNN。作为示例,我们基于两个著名的平均场系统——居里-外斯模型和谢林顿-柯克帕特里克模型——推导出高效的新型ARNN架构,在逼近相应物理模型的玻尔兹曼分布时表现出优于其他常用ARNN架构的性能。系统物理特性与ARNN架构之间建立的关联,为不同相互作用系统设计新型神经网络架构提供了方法论,同时为从物理视角解读现有架构开辟了途径。