Graphical models serve as effective tools for visualizing conditional dependencies between variables. However, as the number of variables grows, interpretation becomes increasingly difficult, and estimation uncertainty increases due to the large number of parameters relative to the number of observations. To address these challenges, we introduce the Clusterpath estimator of the Gaussian Graphical Model (CGGM) that encourages variable clustering in the graphical model in a data-driven way. Through the use of a clusterpath penalty, we group variables together, which in turn results in a block-structured precision matrix whose block structure remains preserved in the covariance matrix. We present a computationally efficient implementation of the CGGM estimator by using a cyclic block coordinate descent algorithm. In simulations, we show that CGGM not only matches, but oftentimes outperforms other state-of-the-art methods for variable clustering in graphical models. We also demonstrate CGGM's practical advantages and versatility on a diverse collection of empirical applications.
翻译:图模型是可视化变量间条件依赖关系的有效工具。然而,随着变量数量的增加,解释难度日益加大,且由于参数数量相对于观测样本量过大,估计的不确定性也随之增加。为应对这些挑战,我们提出了高斯图模型的聚类路径估计器(CGGM),以数据驱动的方式促进图模型中的变量聚类。通过使用聚类路径惩罚,我们将变量分组,从而得到具有块状结构的精度矩阵,且该块结构在协方差矩阵中得以保持。我们采用循环块坐标下降算法,实现了CGGM估计器的高效计算。在模拟实验中,我们证明CGGM不仅能够匹配,而且常常优于图模型中变量聚类的其他先进方法。我们还在多个实证应用案例中展示了CGGM的实际优势与广泛适用性。