Profile Graphical Models

We introduce a novel class of graphical models, termed profile graphical models, that represent, within a single graph, how an external factor influences the dependence structure of a multivariate set of variables. This class is quite general and includes multiple graphs and chain graphs as special cases. Profile graphical models capture the conditional distributions of a multivariate random vector given different levels of a risk factor, and learn how the conditional independence structure among variables may vary across these risk profiles; we formally define this family of models and establish their corresponding Markov properties. We derive key structural and probabilistic properties that underpin a more powerful inferential framework than existing approaches, underscoring that our contribution extends beyond a novel graphical representation.Furthermore, we show that the resulting profile undirected graphical models are independence-compatible with two-block LWF chain graph models.We then develop a Bayesian approach for Gaussian undirected profile graphical models based on continuous spike-and-slab priors to learn shared sparsity structures across different levels of the risk factor. We also design a fast EM algorithm for efficient inference. Inferential properties are explored through simulation studies, including the comparison with competing methods. The practical utility of this class of models is demonstrated through the analysis of protein network data from various subtypes of acute myeloid leukemia. Our results show a more parsimonious network and greater patient heterogeneity than its competitors, highlighting its enhanced ability to capture subject-specific differences.

翻译：我们提出了一类新型图形模型，称为轮廓图形模型，它能够在单个图中表示外部因素如何影响多变量集合的依赖结构。该类模型非常通用，包含多图模型和链图模型作为特例。轮廓图形模型捕捉了给定风险因子不同水平下多变量随机向量的条件分布，并学习变量间的条件独立结构如何随这些风险轮廓变化；我们正式定义了这类模型族并建立了其对应的马尔可夫性质。我们推导了关键的结构性和概率性性质，这些性质支持比现有方法更强大的推断框架，强调我们的贡献不仅限于新颖的图形表示。此外，我们证明由此产生的轮廓无向图形模型与两区块LWF链图模型具有独立性兼容性。随后，我们基于连续尖峰-板先验开发了高斯无向轮廓图形模型的贝叶斯方法，以学习风险因子不同水平间的共享稀疏结构。我们还设计了一种快速EM算法以实现高效推断。通过模拟研究（包括与竞争方法的比较）探讨了推断性质。通过对急性髓系白血病各亚型蛋白质网络数据的分析，展示了该类模型的实用性。结果表明，相比竞争方法，我们的模型网络更简洁、患者异质性更强，突显了其捕捉个体差异的增强能力。