Polytrees are a subclass of Bayesian networks that seek to capture the conditional dependencies between a set of $n$ variables as a directed forest and are motivated by their more efficient inference and improved interpretability. Since the problem of learning the best polytree is NP-hard, we study which restrictions make it more tractable by considering for example in-degree bounds, properties of score functions measuring the quality of a polytree, and approximation algorithms. We devise an algorithm that finds the optimal polytree in time $O((2+ε)^n)$ for arbitrarily small $ε> 0$ and any constant in-degree bound $k$, improving over the fastest previously known algorithm of time complexity $O(3^n)$. We further give polynomial-time algorithms for finding a polytree whose score is within a factor of $k$ from the optimal one for arbitrary scores and a factor of $2$ for additive ones. Many of the results are complemented by (nearly) tight lower bounds for either the time complexity or the approximation factors.
翻译:多树是贝叶斯网络的一个子类,旨在将一组 $n$ 个变量之间的条件依赖关系建模为有向森林,其优势在于更高效的推理和更强的可解释性。由于学习最优多树问题属于NP难问题,我们通过考察入度约束、衡量多树质量的评分函数性质以及近似算法等限制条件,研究哪些约束能使其更易处理。我们设计了一种算法,能够在 $O((2+ε)^n)$ 时间内找到任意小正数 $ε>0$ 和任意常数入度边界 $k$ 下的最优多树,改进了先前已知时间复杂度为 $O(3^n)$ 的最快算法。对于任意评分函数,我们进一步给出了多项式时间算法,可找到一个评分在最优值 $k$ 倍以内的多树;对于可加性评分函数,该因子可优化至 $2$。许多结果还辅以时间复杂性或近似因子的(近乎)紧下界。