Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
翻译:基于评分的方法用于学习贝叶斯网络时,旨在最大化全局评分函数。然而,当局部变量同时存在直接依赖和间接依赖关系时,对评分函数进行全局优化会遗漏具有间接依赖关系的变量之间的边,因为这类边的评分低于具有直接依赖关系的边。本文提出了一种基于已确定父节点子集的可辨识性条件,用于识别底层有向无环图。利用该可辨识性条件,我们开发了一种名为最优调谐的两阶段算法,用于对全局优化结果进行局部修正。在最优阶段,基于一阶希尔伯特-施密特独立准则的优化问题给出了一个初始骨架,作为初始确定的父节点子集。在调谐阶段,利用理论上证明的高阶HSIC增量特性,通过删除、添加以及DAG形式化策略对骨架进行局部调整。针对不同合成数据集和真实数据集的数值实验表明,OT算法优于现有方法。特别是在图规模为d=40的Sigmoid混合模型中,OT算法的结构干预距离比CAM方法小329.7,这表明与CAM相比,OT算法估计的图遗漏的边更少。OT算法的源代码可从https://github.com/YafeiannWang/optimal-tune-algorithm获取。