An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models

This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $\ell_0$-penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the $\ell_0$-penalized maximum likelihood estimator. Finite-sample statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.

翻译：本文研究从连续观测数据学习贝叶斯网络的问题，该数据根据线性高斯结构方程模型生成。我们考虑该问题的$\ell_0$惩罚极大似然估计量，该估计量已知具有良好的统计性质，但在计算求解上具有挑战性，特别是对于中等规模的贝叶斯网络。我们提出一种新的坐标下降算法来逼近该估计量，并证明我们方法的若干显著特性：该算法收敛至坐标极小点，且尽管损失函数非凸，当样本量趋于无穷时，坐标下降解的目标函数值收敛至$\ell_0$惩罚极大似然估计量的最优目标函数值。本文同时建立了有限样本统计一致性保证。据我们所知，我们的方法是首个在学习贝叶斯网络背景下具备最优性与统计保证的坐标下降算法。在合成数据与真实数据上的数值实验表明，我们的坐标下降方法能够在保持可扩展性的同时获得接近最优的解。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日