Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based subgroup identification have been developed, offering more comprehensive insights than those restricted to mean, variance, and other simplistic distributions by incorporating the interconnections among variables. However, research on supervised network-based subgroup identification remains limited. In this study, we develop a novel supervised Bayesian graphical model for jointly identifying multiple heterogeneous networks and subgroups. In the proposed model, heterogeneity is not only reflected in molecular data but also associated with a clinical outcome, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification. The consistency properties of the estimates are rigorously established, and an efficient algorithm is developed. Extensive simulation studies and a real-world application to TCGA data are conducted, which demonstrate the advantages of the proposed approach in terms of both subgroup and network identification.
翻译:异质性是癌症的基本特征。为适应异质性,亚组识别已得到广泛研究,并主要分为无监督分析与监督式分析两类。相较于无监督分析,监督式方法具有更重要的临床意义。在无监督分析框架下,已开发出多种基于网络的亚组识别方法。这类方法通过整合变量间的相互关联,能够提供比仅关注均值、方差等简单分布更全面的见解。然而,基于网络的监督式亚组识别研究仍十分有限。本研究提出了一种新颖的监督式贝叶斯图模型,用于联合识别多个异质性网络与亚组。在该模型中,异质性不仅体现在分子数据层面,还与临床结局相关联,并引入了一种新颖的相似性先验,有效刻画不同亚组网络间的相似性,从而显著促进具有临床意义的生物网络构建与亚组识别。我们严格证明了估计量的一致性性质,并开发了高效算法。通过大规模模拟实验及对TCGA数据的实际应用,证明了该方法在亚组识别与网络识别两方面的优越性。