Generative Autoregressive Neural Networks (ARNN) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents a physical interpretation of the ARNNs by reformulating the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures like the residual connections and a recurrent architecture with clear physical meanings. However, the exponential growth, with system size, of the number of parameters of the hidden layers makes its direct application unfeasible. Nevertheless, its architecture's explicit formulation allows using statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performances in approximating the Boltzmann distributions of the corresponding physics model compared to other commonly used ARNN architectures. The connection established between the physics of the system and the ARNN architecture provides a way to derive new neural network architectures for different interacting systems and interpret existing ones from a physical perspective.
翻译:生成式自回归神经网络(ARNN)最近在图像和语言生成任务中展现出卓越性能,推动了生成模型在科学与商业应用中的普及。本研究通过将二元成对相互作用系统的玻尔兹曼分布重新表述为自回归形式,提出了一种ARNN的物理解释。由此得到的ARNN架构中,第一层的权重和偏置分别对应于哈密顿量的耦合项与外场,并具有残差连接和循环架构等广泛应用的结构,且这些结构均具有清晰的物理含义。然而,隐藏层参数数量随系统规模呈指数增长,使其直接应用变得不可行。尽管如此,该架构的显式公式化表述使我们能够利用统计物理技术为特定系统推导新的ARNN。作为示例,我们基于居里-外斯模型和舍林顿-柯克帕特里克模型这两个著名的平均场系统,推导了新的高效ARNN架构。与其它常用ARNN架构相比,这些新架构在逼近相应物理模型的玻尔兹曼分布时展现出更优性能。该系统物理与ARNN架构之间建立的关联,为不同相互作用系统推导新型神经网络架构及从物理视角解释现有架构提供了途径。