Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based subgroup identification have been developed, offering more comprehensive insights than those restricted to mean, variance, and other simplistic distributions by incorporating the interconnections among variables. However, research on supervised network-based subgroup identification remains limited. In this study, we develop a novel supervised Bayesian graphical model for jointly identifying multiple heterogeneous networks and subgroups. In the proposed model, heterogeneity is not only reflected in molecular data but also associated with a clinical outcome, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification. The consistency properties of the estimates are rigorously established, and an efficient algorithm is developed. Extensive simulation studies and a real-world application to TCGA data are conducted, which demonstrate the advantages of the proposed approach in terms of both subgroup and network identification.
翻译:异质性是癌症的基本特征。为适应异质性,亚群识别研究已广泛开展,主要分为无监督与有监督分析两类。相较于无监督分析,有监督方法可能具有更大的临床意义。在无监督分析框架下,已发展出多种基于网络的亚群识别方法,通过纳入变量间的关联性,提供了比局限于均值、方差等简单分布更为全面的见解。然而,基于有监督网络的亚群识别研究仍较为有限。本研究提出一种新颖的有监督贝叶斯图模型,用于联合识别多个异质性网络及亚群。在该模型中,异质性不仅体现在分子数据中,还与临床结局相关联;同时引入一种新颖的相似性先验,以有效适应不同亚群网络间的相似性,显著促进了具有临床意义的生物网络构建与亚群识别。研究严格建立了估计量的相合性性质,并开发了高效算法。通过大量模拟研究及对TCGA数据的实际应用,验证了所提方法在亚群与网络识别方面的优势。