A Unified Approach to Learning Ising Models: Beyond Independence and Bounded Width

We revisit the problem of efficiently learning the underlying parameters of Ising models from data. Current algorithmic approaches achieve essentially optimal sample complexity when given i.i.d. samples from the stationary measure and the underlying model satisfies "width" bounds on the total $\ell_1$ interaction involving each node. We show that a simple existing approach based on node-wise logistic regression provably succeeds at recovering the underlying model in several new settings where these assumptions are violated: (1) Given dynamically generated data from a wide variety of local Markov chains, like block or round-robin dynamics, logistic regression recovers the parameters with optimal sample complexity up to $\log\log n$ factors. This generalizes the specialized algorithm of Bresler, Gamarnik, and Shah [IEEE Trans. Inf. Theory'18] for structure recovery in bounded degree graphs from Glauber dynamics. (2) For the Sherrington-Kirkpatrick model of spin glasses, given $\mathsf{poly}(n)$ independent samples, logistic regression recovers the parameters in most of the known high-temperature regime via a simple reduction to weaker structural properties of the measure. This improves on recent work of Anari, Jain, Koehler, Pham, and Vuong [ArXiv'23] which gives distribution learning at higher temperature. (3) As a simple byproduct of our techniques, logistic regression achieves an exponential improvement in learning from samples in the M-regime of data considered by Dutt, Lokhov, Vuffray, and Misra [ICML'21] as well as novel guarantees for learning from the adversarial Glauber dynamics of Chin, Moitra, Mossel, and Sandon [ArXiv'23]. Our approach thus significantly generalizes the elegant analysis of Wu, Sanghavi, and Dimakis [Neurips'19] without any algorithmic modification.

翻译：我们重新审视从数据中高效学习Ising模型底层参数的问题。当前算法方法在给定平稳测度下的独立同分布样本且底层模型满足每个节点总$\ell_1$相互作用的“宽度”界时，可实现近乎最优的样本复杂度。我们证明，一种基于节点逻辑回归的简单现有方法能在违反这些假设的若干新设定中成功恢复底层模型：(1) 给定来自多种局部马尔可夫链（如块或轮询动力学）的动态生成数据，逻辑回归能以最优样本复杂度（至多$\log\log n$因子）恢复参数。这推广了Bresler、Gamarnik和Shah [IEEE Trans. Inf. Theory'18] 针对Glauber动力学中有界度图结构恢复的专用算法。(2) 对于自旋玻璃的Sherrington-Kirkpatrick模型，给定$\mathsf{poly}(n)$个独立样本，逻辑回归通过简单约化至测度的较弱结构性质，可在已知高温区的大部分区域恢复参数。这改进了Anari、Jain、Koehler、Pham和Vuong [ArXiv'23] 在更高温度下进行分布学习的近期工作。(3) 作为我们技术的简单副产品，逻辑回归在Dutt、Lokhov、Vuffray和Misra [ICML'21] 考虑的M区域数据中实现了指数级的学习改进，并为Chin、Moitra、Mossel和Sandon [ArXiv'23] 的对抗性Glauber动力学提供了全新保证。因此，我们的方法在不修改算法的情况下显著推广了Wu、Sanghavi和Dimakis [Neurips'19] 的精妙分析。