Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can simultaneously perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.
翻译:学习贝叶斯网络的图形结构是描述许多复杂应用中数据生成机制的关键,但面临巨大的计算挑战。观测数据仅能识别贝叶斯网络模型下有向无环图的等价类,目前已存在多种方法解决该问题。在特定假设下,流行的PC算法可通过逆向推导变量分布中的条件独立性关系,一致地恢复正确的等价类。双重PC算法是一种新型方案,它通过利用协方差矩阵与精度矩阵的逆关系,在PC算法框架内执行条件独立性检验。借助分块矩阵求逆,我们可同时对互补(或对偶)条件集的偏相关系数执行检验。该算法中的多重条件独立性检验首先从边缘与全阶条件独立性关系出发,逐步过渡到中间阶数关系。仿真研究表明,即使在高斯性偏离的情况下,双重PC算法在运行时间与恢复底层网络结构方面均优于经典PC算法。此外,我们证明双重PC算法适用于高斯连接函数模型,并展示了其在该场景中的性能表现。