Graphical models serve as effective tools for visualizing conditional dependencies between variables. However, as the number of variables grows, interpretation becomes increasingly difficult, and estimation uncertainty increases due to the large number of parameters relative to the number of observations. To address these challenges, we introduce the Clusterpath estimator of the Gaussian Graphical Model (CGGM) that encourages variable clustering in the graphical model in a data-driven way. Through the use of an aggregation penalty, we group variables together, which in turn results in a block-structured precision matrix whose block structure remains preserved in the covariance matrix. The CGGM estimator is formulated as the solution to a convex optimization problem, making it easy to incorporate other popular penalization schemes which we illustrate through the combination of an aggregation and sparsity penalty. We present a computationally efficient implementation of the CGGM estimator by using a cyclic block coordinate descent algorithm. In simulations, we show that CGGM not only matches, but oftentimes outperforms other state-of-the-art methods for variable clustering in graphical models. We also demonstrate CGGM's practical advantages and versatility on a diverse collection of empirical applications.
翻译:图形模型作为可视化变量间条件依赖关系的有效工具,当变量数量增多时,其解释性会显著降低,且由于参数数量相对观测数量过大,估计不确定性随之增加。为解决上述挑战,我们提出高斯图形模型的Clusterpath估计量(CGGM),该估计量以数据驱动方式促进图形模型中的变量聚类。通过引入聚合惩罚机制,我们对变量进行分组,从而获得块结构精度矩阵,且该块结构在协方差矩阵中得以保持。CGGM估计量被定义为凸优化问题的解,这使其易于整合其他流行的惩罚方案——我们通过聚合惩罚与稀疏惩罚的组合对此进行了说明。我们采用循环块坐标下降算法实现了CGGM估计量的高效计算。模拟实验表明,CGGM不仅在图形模型变量聚类方面与现有最优方法相当,更在多数情况下表现更优。此外,我们通过多组实证应用验证了CGGM的实用优势与广泛适用性。