We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Bayesian network structure discovery is a fundamental yet NP-hard problem in the field of probabilistic graphical models, and as the number of variables increases, memory usage grows exponentially. The simple and effective method proposed by Silander and Myllym\"aki has been widely applied in this field, as it incrementally calculates local scores to achieve global optimality. However, existing methods that utilize disk storage, while capable of handling networks with a larger number of variables, introduce issues such as latency, fragmentation, and additional overhead associated with disk I/O operations. To avoid these problems, we explore how to further enhance computational efficiency and reduce peak memory usage using only memory. We introduce an efficient hierarchical computation method that requires only a single traversal of all local structures, retaining only the data and information necessary for the current computation, thereby improving efficiency and significantly reducing memory requirements. Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency compared to existing methods, demonstrating good scalability for handling larger networks and exhibiting stable experimental results. Ultimately, we successfully achieved the processing of a Bayesian network with 28 variables using only memory.
翻译:我们提出了一种基于渐进分层评分方法的全局最优贝叶斯网络结构发现算法。贝叶斯网络结构发现是概率图模型领域一个基础但NP难的问题,随着变量数量的增加,内存使用量呈指数级增长。Silander和Myllymäki提出的简单有效方法通过逐步计算局部得分以实现全局最优,已在该领域得到广泛应用。然而,现有利用磁盘存储的方法虽然能够处理变量数量更多的网络,但会引入延迟、碎片化以及磁盘I/O操作相关的额外开销等问题。为避免这些问题,我们探索了如何仅使用内存来进一步提升计算效率并降低峰值内存使用量。我们引入了一种高效的分层计算方法,该方法仅需对所有局部结构进行一次遍历,仅保留当前计算所需的数据和信息,从而提高了效率并显著降低了内存需求。实验结果表明,与现有方法相比,我们的方法在仅使用内存的情况下,不仅降低了峰值内存使用量,还提高了计算效率,在处理更大规模网络时表现出良好的可扩展性,并展现出稳定的实验结果。最终,我们成功实现了仅使用内存处理包含28个变量的贝叶斯网络。