In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the non-zero entries of the inverse covariance matrix of the data. Due to the high-dimensional nature of such problems, integrative methods that leverage similarities between multiple graphical structures have become increasingly popular. The joint graphical lasso is a powerful tool for this purpose, however, the current AIC-based selection criterion used to tune the network sparsities and similarities leads to poor performance in high-dimensional settings. We propose stabJGL, which equips the joint graphical lasso with a stable and accurate penalty parameter selection approach that combines the notion of model stability with likelihood-based similarity selection. The resulting method makes the powerful joint graphical lasso available for use in omics settings, and outperforms the standard joint graphical lasso, as well as state-of-the-art joint methods, in terms of all performance measures we consider. Applying stabJGL to proteomic data from a pan-cancer study, we demonstrate the potential for novel discoveries the method brings. A user-friendly R package for stabJGL with tutorials is available on Github at https://github.com/Camiling/stabJGL.
翻译:摘要:近年来,网络模型因其捕捉复杂关联的能力而日益受到关注。在统计组学中,网络可用于建模和研究基因、蛋白质及其他类型组学数据之间的功能关系。若假设为高斯图模型,则基因关联网络可通过数据逆协方差矩阵的非零项确定。由于此类问题的高维特性,利用多个图结构间相似性的整合方法逐渐流行。联合图套索(joint graphical lasso)是实现该目的的有力工具,然而当前用于调节网络稀疏性与相似性的基于AIC的选择准则在高维场景下表现不佳。我们提出stabJGL方法,该方法为联合图套索配备了稳定且精确的惩罚参数选择策略,将模型稳定性概念与基于似然的相似性选择相结合。该结果方法使得强大的联合图套索可在组学场景中应用,并在我们考虑的所有性能指标上优于标准联合图套索及当前最先进的联合方法。通过将stabJGL应用于泛癌研究中的蛋白质组学数据,我们展示了该方法在新型发现方面的潜力。stabJGL的用户友好型R包及教程可在Github上获取:https://github.com/Camiling/stabJGL。