We study \emph{learning-to-sample} -- a basic algorithmic task underlying generative modeling -- for Ising models, a standard testbed for algorithmic ideas in both theoretical computer science and machine learning. Given i.i.d. samples of an unknown target distribution, the goal of learning-to-sample is to learn a computationally efficient generation procedure that produces new samples following approximately the same distribution. We construct a family of Ising models of constantly bounded-width which lie just beyond the spectral threshold $λ_{\max}(J)-λ_{\min}(J)=1$, and show that learning-to-sample for this family is computationally hard under standard cryptographic assumptions, even when the learner is given both polynomially many i.i.d. samples from the model and explicit access to its parameters. Combined with results of [AJKPV24,KLV25] showing tractability of learning-to-sample below the spectral threshold, this establishes a sharp computational phase transition at the spectral threshold. Moreover, combined with prior results on parameter learning for bounded-width Ising models [KM17,WSD19,VML20], this shows that learning-to-sample can be more difficult than parameter learning. Finally, we show that any efficient learner for these hard instances exhibits a natural memorization-hallucination dichotomy: the learner must either output configurations that, after a simple transformation, match the (transformed) training data or place substantial mass on configurations of negligible probability under the target distribution.
翻译:我们研究伊辛模型中的\emph{学习采样}——生成式建模的基础算法任务,该模型是理论计算机科学和机器学习中算法思想的标准测试平台。给定未知目标分布的独立同分布样本,学习采样的目标是学习一种计算高效的生成程序,使其生成的样本近似服从同一分布。我们构造了一族宽度有界常数且刚好超出谱阈值$λ_{\max}(J)-λ_{\min}(J)=1$的伊辛模型,并表明在标准密码学假设下,即使学习者同时获得模型的多项式数量独立同分布样本和显式参数访问权限,该家族的学习采样问题在计算上仍具困难性。结合[AJKPV24, KLV25]中关于谱阈值以下学习采样可处理性的结果,我们确立了谱阈值处存在尖锐的计算相变。此外,结合以往关于有界宽度伊辛模型参数学习的结果[KM17, WSD19, VML20],这表明学习采样可能比参数学习更为困难。最后,我们证明针对这些困难实例的任何高效学习者都呈现出自然的记忆-幻觉二分现象:学习者要么输出经简单变换后与(变换后的)训练数据匹配的构型,要么将大量概率质量置于目标分布下概率可忽略的构型上。