A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support and make inference more computationally efficient in Bayesian Network (BN), Markov process and other models. However, optimal partitioning with an arbitrary cost function is challenging, especially in statistical inference as the local cluster cost is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a Dynamic BN (DBN) model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and $1.51\times10^{21}$ possible cluster mappings, and the first optimal solution was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$ with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.
翻译:有向无环图(DAG)可被划分或映射为簇,以支持贝叶斯网络(BN)、马尔可夫过程及其他模型中的推断,并提升计算效率。然而,在任意代价函数下实现最优划分极具挑战性,尤其在统计推断中,局部簇代价不仅依赖于簇内节点,还取决于通过父节点和/或子节点相连的簇映射关系——我们称之为依赖簇。本文提出一种名为DCMAP的新型算法,用于实现含依赖簇的最优簇映射。基于DAG上任意定义的正值代价函数,我们证明DCMAP可收敛至所有最优簇,并在求解过程中返回近似最优解。实验表明,在计算代价函数驱动的海草复合系统动态BN(DBN)模型中,该算法具有时间高效性。对于25节点和50节点的DBN,搜索空间规模分别为$9.91\times 10^9$和$1.51\times10^{21}$种可能簇映射,首次最优解分别出现在第934次迭代(95%置信区间 926,971)和第2256次迭代(2150,2271),其代价仅为朴素启发式代价的4%和0.2%。