Scaling laws, the power-law relations between loss, architecture size, and compute observed in modern neural networks, offer a quantitative way to characterize the complexity of a learning problem, with the exponent governing the decay of the loss reflecting how rapidly additional resources translate into improved accuracy, and thus how hard the target is to learn. Whether an analogous framework can characterize the complexity of physical problems remains open. We address this question for Neural-Network Quantum States, a leading variational approach for strongly correlated quantum many-body systems. Using transformer wave functions to approximate ground states of the $J_1$-$J_2$ Heisenberg model on triangular and square lattices with up to $20\times 20$ sites, we find that the $V$-score, a measure of accuracy of a variational state, decays as a power law in training compute. Under an appropriate rescaling of compute, results for different system sizes collapse onto a single curve, analogous to scaling collapse in critical phenomena. The resulting power law is, to a good approximation, independent of the number of sites, showing that the transformer Ansatz is size-consistent for the systems considered. The exponent decreases systematically with frustration, identifying it as a quantitative measure of representational difficulty of the ground state and establishing scaling laws as a general framework for benchmarking variational ansätze.
翻译:尺度定律,即现代神经网络中损失函数、架构规模与计算量之间存在的幂律关系,为量化学习问题的复杂度提供了一种定量方法。其中,控制损失衰减的指数反映了额外资源转化为精度提升的速度,从而揭示了目标的难度。然而,类似的框架能否刻画物理问题的复杂性仍是一个悬而未决的问题。我们针对神经网络量子态这一强关联量子多体系统的前沿变分方法,探讨了该问题。通过使用Transformer波函数近似三角晶格和正方晶格上最多包含20×20格点的$J_1$-$J_2$海森堡模型的基态,我们发现变分态精度的度量指标$V$-分数在训练计算量增加时呈现幂律衰减。在对计算量进行适当重标度后,不同系统尺寸的结果坍缩至单一曲线,这类似于临界现象中的尺度坍缩。所得幂律与格点数近似无关,表明所考虑的系统中Transformer假设具有尺寸一致性。随着阻挫程度的增强,该指数系统性地降低,从而将其确立为基态表示难度的定量度量,并将尺度定律确立为变分假设基准测试的通用框架。