A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support and make inference more computationally efficient in Bayesian Network (BN), Markov process and other models. However, optimal partitioning with an arbitrary cost function is challenging, especially in statistical inference as the local cluster cost is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a Dynamic BN (DBN) model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and $1.51\times10^{21}$ possible cluster mappings, and the first optimal solution was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$ with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.
翻译:有向无环图(DAG)可被划分为多个簇以实现聚类映射,从而在贝叶斯网络(BN)、马尔可夫过程及其他模型中提升推理的计算效率。然而,在任意代价函数下实现最优划分极具挑战性,尤其在统计推断中,局部簇的代价不仅取决于簇内节点,还受制于通过父节点和/或子节点相连的簇映射(称为依赖簇)。本文提出一种新颖算法——依赖簇映射(DCMAP),用于处理依赖簇的最优聚类问题。基于DAG定义的任意正代价函数,我们证明了DCMAP能够收敛至全部最优簇解,并在过程中同步生成近似最优解。实验表明,通过计算代价函数对海草复合系统的动态贝叶斯网络(DBN)模型进行测试时,该算法具有时间高效性。针对25节点和50节点的DBN模型,其搜索空间规模分别为$9.91\times 10^9$和$1.51\times10^{21}$种可能的簇映射。首个最优解分别在第934轮迭代(95%置信区间926-971)和第2256轮迭代(2150-2271)被找到,其代价仅为朴素启发式代价的4%和0.2%。