Tree graphs are routinely used in statistics. When estimating a Bayesian model with a tree component, sampling the posterior remains a core difficulty. Existing Markov chain Monte Carlo methods tend to rely on local moves, often leading to poor mixing. A promising approach is to instead directly sample spanning trees on an auxiliary graph. Current spanning tree samplers, such as the celebrated Aldous--Broder algorithm, predominantly rely on simulating random walks that are required to visit all the nodes of the graph. Such algorithms are prone to getting stuck in certain sub-graphs. We formalize this phenomenon using the bottlenecks in the random walk's transition probability matrix. We then propose a novel fast-forwarded cover algorithm that can break free from bottlenecks. The core idea is a marginalization argument that leads to a closed-form expression which allows for fast-forwarding to the event of visiting a new node. Unlike many existing approximation algorithms, our algorithm yields exact samples. We demonstrate the enhanced efficiency of the fast-forwarded cover algorithm, and illustrate its application in fitting a Bayesian dendrogram model on a Massachusetts crimes and communities dataset.
翻译:树状图在统计学中广泛应用。当估计含树成分的贝叶斯模型时,对后验分布进行采样始终是核心难题。现有马尔可夫链蒙特卡洛方法往往依赖局部移动,常导致混合不良。一种有前景的方案是直接在辅助图上采样生成树。当前生成树采样器(如著名的Aldous-Broder算法)主要依赖需要遍历图中所有节点的随机游走模拟。这类算法容易陷入特定子图。我们利用随机游走转移概率矩阵中的瓶颈现象对此进行形式化描述,继而提出一种可突破瓶颈的新型快速前向覆盖算法。其核心思路是通过边际化论证得到闭式表达式,从而实现对访问新节点事件的快速前向预测。与许多现有近似算法不同,我们的算法能够生成精确样本。我们展示了快速前向覆盖算法的增强效率,并阐明了其在马萨诸塞州犯罪与社区数据集上拟合贝叶斯树状图模型的应用。