Usual parametric and semi-parametric regression methods are inappropriate and inadequate for large clustered survival studies when the appropriate functional forms of the covariates and their interactions in hazard functions are unknown, and random cluster effects and cluster-level covariates are spatially correlated. We present a general nonparametric method for such studies under the Bayesian ensemble learning paradigm called Soft Bayesian Additive Regression Trees. Our methodological and computational challenges include large number of clusters, variable cluster sizes, and proper statistical augmentation of the unobservable cluster-level covariate using a data registry different from the main survival study. We use an innovative 3-step approach based on latent variables to address our computational challenges. We illustrate our method and its advantages over existing methods by assessing the impacts of intervention in some county-level and patient-level covariates to mitigate existing racial disparity in breast cancer survival in 67 Florida counties (clusters) using two different data resources. Florida Cancer Registry (FCR) is used to obtain clustered survival data with patient-level covariates, and the Behavioral Risk Factor Surveillance Survey (BRFSS) is used to obtain further data information on an unobservable county-level covariate of Screening Mammography Utilization (SMU).
翻译:当协变量及其在风险函数中交互作用的适当函数形式未知,且随机聚类效应与聚类水平协变量存在空间相关性时,常规参数化与半参数化回归方法对于大规模聚类生存研究既不适用也不充分。我们在贝叶斯集成学习范式下提出一种通用的非参数化方法,称为软贝叶斯可加回归树。本方法面临的方法学与计算挑战包括:聚类数量庞大、聚类规模可变,以及利用独立于主生存研究的数据登记系统对不可观测的聚类水平协变量进行恰当的统计增广。我们采用基于潜变量的创新三步法来解决计算难题。通过评估佛罗里达州67个县(聚类)中县级与患者级协变量的干预影响以缓解乳腺癌生存率的现有种族差异,我们使用两种不同数据资源阐明了本方法及其相对于现有方法的优势。其中佛罗里达癌症登记处用于获取包含患者级协变量的聚类生存数据,行为风险因素监测系统则用于获取不可观测的县级协变量——筛查性乳腺X线摄影利用率——的补充数据信息。