Inferring directed acyclic graphs (DAGs) from data via Markov chain Monte Carlo (MCMC) is computationally challenging in moderate-to-high dimensional settings because their discrete sampling space grows super-exponentially with the number of nodes. To address scalability, several recent MCMC-based graph inference methods restrict the search space to a subset of edges, at the cost of introducing error into the inference procedure. In this work, we derive sharp lower and upper bounds on the total variation distance between the unrestricted posterior distribution and the posterior distribution induced by a state-of-the-art restricted search space MCMC method. These bounds characterize regimes in which the approximation error is negligible and regimes in which it is not. In order to reduce the error, we propose a flexible transdimensional MCMC sampler which allows the search space to expand or contract dynamically as the chain progresses. The sampler is defined by birth-and-death rates that induce a prior distribution on the set of search spaces, rather than assume a fixed restricted search space throughout. We outline an efficient implementation of the proposed algorithm and demonstrate its finite-sample performance through simulation studies.
翻译:通过马尔可夫链蒙特卡洛(MCMC)从数据中推断有向无环图(DAG)在中高维设置中面临计算挑战,因为其离散采样空间随节点数量呈超指数增长。为解决可扩展性问题,近期若干基于MCMC的图推断方法将搜索空间限制在边子集上,但代价是在推断过程中引入误差。本文推导了无限制后验分布与由最先进的受限搜索空间MCMC方法诱导的后验分布之间总变差距离的严格下界和上界。这些界刻画了近似误差可忽略及不可忽略的机制。为减小误差,我们提出一种灵活的跨维度MCMC采样器,允许搜索空间随链的推进动态扩展或收缩。该采样器通过出生-死亡率定义,在搜索空间集合上诱导先验分布,而非全程假设固定的受限搜索空间。我们概述了所提算法的高效实现,并通过仿真研究展示了其有限样本性能。