Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.
翻译:阶段树模型通过基于阶段的结构纳入上下文特定依赖关系,从而扩展了贝叶斯网络。本研究提出了一种利用单纯形上的层次聚类来估计阶段树的新框架,该方法采用基于单纯形的散度。我们对包括全变差、Hellinger、Fisher和Kaniadakis在内的多种距离与散度度量,以及Ward.D2、平均、完全和McQuitty等不同连接方法进行了全面评估。仿真实验表明,全变差距离,特别是与Ward.D2连接方法结合时,能够持续产生具有更优模型拟合度、结构恢复能力和计算效率的阶段树。我们通过相对贝叶斯信息准则(BIC)和汉明距离来评估性能。研究结果表明,虽然后向爬山法(BHC)能取得有竞争力的结果,但其计算成本显著更高。相比之下,采用Ward.D2连接的全变差散度在实现相似性能的同时,提供了显著更优的计算效率,使其成为大规模或时间敏感任务中更可行的选择。