Bypassing the Noisy Parity Barrier: Learning Higher-Order Markov Random Fields from Dynamics

We consider the problem of learning graphical models, also known as Markov random fields (MRFs) from temporally correlated samples. As in many traditional statistical settings, fundamental results in the area all assume independent samples from the distribution. However, these samples generally will not directly correspond to more realistic observations from nature, which instead evolve according to some stochastic process. From the computational lens, even generating a single sample from the true MRF distribution is intractable unless $\mathsf{NP}=\mathsf{RP}$, and moreover, any algorithm to learn from i.i.d. samples requires prohibitive runtime due to hardness reductions to the parity with noise problem. These computational barriers for sampling and learning from the i.i.d. setting severely lessen the utility of these breakthrough results for this important task; however, dropping this assumption typically only introduces further algorithmic and statistical complexities. In this work, we surprisingly demonstrate that the direct trajectory data from a natural evolution of the MRF overcomes the fundamental computational lower bounds to efficient learning. In particular, we show that given a trajectory with $\widetilde{O}_k(n)$ site updates of an order $k$ MRF from the Glauber dynamics, a well-studied, natural stochastic process on graphical models, there is an algorithm that recovers the graph and the parameters in $\widetilde{O}_k(n^2)$ time. By contrast, all prior algorithms for learning order $k$ MRFs inherently suffer from $n^{\Theta(k)}$ runtime even in sparse instances due to the reductions to sparse parity with noise. Our results thus surprisingly show that this more realistic, but intuitively less tractable, model for MRFs actually leads to efficiency far beyond what is known and believed to be true in the traditional i.i.d. case.

翻译：我们考虑从时间相关样本中学习图模型，即马尔可夫随机场（MRFs）的问题。与许多传统统计设置类似，该领域的基础性成果均假设样本独立同分布于目标分布。然而，这些样本通常无法直接对应自然界中更真实的观测数据，后者往往遵循某种随机过程演化。从计算视角看，除非 $\mathsf{NP}=\mathsf{RP}$，否则即使从真实 MRF 分布生成单个样本也是难解的；此外，由于归约到带噪声奇偶性问题，任何基于独立同分布样本的学习算法都需要极高的运行时间。这些在独立同分布设置下采样与学习的计算障碍，严重削弱了突破性成果对此重要任务的实际效用；然而，放弃该假设通常只会引入更多算法与统计复杂性。在本工作中，我们出人意料地证明：来自 MRF 自然演化过程的直接轨迹数据，能够克服高效学习的根本性计算下界。具体而言，我们证明给定来自 Glauber 动力学（一种经过充分研究的图模型自然随机过程）的 $k$ 阶 MRF 轨迹，其中包含 $\widetilde{O}_k(n)$ 个位点更新，存在一种算法可在 $\widetilde{O}_k(n^2)$ 时间内恢复图结构及参数。相比之下，所有现有 $k$ 阶 MRF 学习算法因归约至稀疏带噪声奇偶性问题，即使在稀疏实例中也必然遭受 $n^{\Theta(k)}$ 的运行时间。因此，我们的结果惊人地表明：这种更贴近现实但直觉上更棘手的 MRF 模型，实际上能带来远超传统独立同分布情形中已知及公认可能达到的效率。