We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data. Building on distributionally robust optimization and a regression approach, we propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution. The worst-case risk accounts for the effect of outliers. The proposed approach applies for general categorical random variables without assuming faithfulness, an ordinal relationship or a specific form of conditional distribution. We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach. Under mild assumptions, we derive non-asymptotic guarantees for successful structure learning with logarithmic sample complexities for bounded-degree graphs. Numerical study on synthetic and real datasets validates the effectiveness of our method. Code is available at https://github.com/DanielLeee/drslbn.
翻译:本文研究在可能包含异常数据的情况下,学习通用离散贝叶斯网络精确骨架的问题。基于分布鲁棒优化与回归方法,我们提出:在经验分布周围具有有界Wasserstein距离或KL散度的分布族中,优化最坏情况下的风险。该最坏情况风险能够刻画异常值的影响。所提方法适用于一般类别型随机变量,无需假设忠实性、序关系或特定的条件分布形式。我们给出了高效算法,并证明所提方法与标准正则化回归方法密切相关。在温和假设下,对于有界度图,我们推导了具有对数级样本复杂度的非渐近成功结构学习保证。在合成及真实数据集上的数值实验验证了方法的有效性。代码见https://github.com/DanielLeee/drslbn。