In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the non-zero entries of the inverse covariance matrix of the data. Due to the high-dimensional nature of such problems, integrative methods that leverage similarities between multiple graphical structures have become increasingly popular. The joint graphical lasso is a powerful tool for this purpose, however, the current AIC-based selection criterion used to tune the network sparsities and similarities leads to poor performance in high-dimensional settings. We propose stabJGL, which equips the joint graphical lasso with a stable and accurate penalty parameter selection approach that combines the notion of model stability with likelihood-based similarity selection. The resulting method makes the powerful joint graphical lasso available for use in omics settings, and outperforms the standard joint graphical lasso, as well as state-of-the-art joint methods, in terms of all performance measures we consider. Applying stabJGL to proteomic data from a pan-cancer study, we demonstrate the potential for novel discoveries the method brings. A user-friendly R package for stabJGL with tutorials is available on Github at https://github.com/Camiling/stabJGL.
翻译:摘要:近年来,网络模型因其捕捉复杂关联的能力而备受关注。在统计组学中,网络可用于建模和研究基因、蛋白质及其他组学数据之间的功能关系。若假设高斯图模型,基因关联网络可通过数据逆协方差矩阵的非零元素确定。由于此类问题的高维特性,利用多个图结构间相似性的整合方法日益流行。联合图套索(joint graphical lasso)为此提供了有力工具,然而,当前用于调节网络稀疏性和相似性的基于AIC的选择标准在高维场景下表现不佳。我们提出StabJGL方法,该方法为联合图套索配备了稳定且精确的惩罚参数选择策略,将模型稳定性概念与基于似然的相似性选择相结合。所得方法使强大的联合图套索适用于组学场景,并在所有评估指标上优于标准联合图套索及最先进的联合方法。通过将StabJGL应用于泛癌研究的蛋白质组学数据,我们展示了该方法在揭示新发现方面的潜力。包含教程的用户友好型StabJGL R包可在GitHub(https://github.com/Camiling/stabJGL)获取。