This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $\ell_0$-penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the $\ell_0$-penalized maximum likelihood estimator. Finite-sample statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.
翻译:本文研究从连续观测数据学习贝叶斯网络的问题,该数据根据线性高斯结构方程模型生成。我们考虑该问题的$\ell_0$惩罚极大似然估计量,该估计量已知具有良好的统计性质,但在计算求解上具有挑战性,特别是对于中等规模的贝叶斯网络。我们提出一种新的坐标下降算法来逼近该估计量,并证明我们方法的若干显著特性:该算法收敛至坐标极小点,且尽管损失函数非凸,当样本量趋于无穷时,坐标下降解的目标函数值收敛至$\ell_0$惩罚极大似然估计量的最优目标函数值。本文同时建立了有限样本统计一致性保证。据我们所知,我们的方法是首个在学习贝叶斯网络背景下具备最优性与统计保证的坐标下降算法。在合成数据与真实数据上的数值实验表明,我们的坐标下降方法能够在保持可扩展性的同时获得接近最优的解。