We recently developed a new method riAFT-BART to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the $\R$ package $\textsf{riAFTBART}$.
翻译:我们近期开发了一种新方法 riAFT-BART,能够从具有聚类性和删失特征的生存数据中推断群体治疗效应对患者生存的影响,同时考虑多层次数据结构。该方法的实际应用价值超越了群体平均治疗效应的估计。在本研究中,我们阐述了如何利用 riAFT-BART 解决聚类生存数据中的两个重要统计问题:估计治疗效应异质性与变量选择。基于似然驱动的机器学习,我们描述了从 riAFT-BART 模型运行中抽取个体生存治疗效应后验样本的方法,并利用这些后验样本进行探索性治疗效应异质性分析,以识别那些可能具有与群体平均效应不同的治疗效应的亚群。目前关于聚类和删失生存数据变量选择方法(尤其是采用灵活建模技术的方法)的文献较为匮乏。我们提出了一种基于置换的方法,利用 riAFT-BART 模型提供的预测变量包含比例进行变量选择。针对健康数据库中常见的缺失数据问题,我们提出了一种结合 Bootstrap 插补与 riAFT-BART 的策略,用于在不完整的聚类生存数据中进行变量选择。我们进行了广泛的模拟研究,以检验所提出方法的实际运行特性。最后,我们通过一个案例研究(分析重症 COVID-19 患者住院死亡率的预测因素并估计三种 COVID-19 特异性药物的异质性治疗效果)来展示这些方法。本研究所开发的方法已集成在 R 包 \textsf{riAFTBART} 中供直接使用。