Heterogeneous treatment effects (HTE) based on patients' genetic or clinical factors are of significant interest to precision medicine. Simultaneously modeling HTE and corresponding main effects for randomized clinical trials with high-dimensional predictive markers is challenging. Motivated by the modified covariates approach, we propose a two-stage statistical learning procedure for estimating HTE with optimal efficiency augmentation, generalizing to arbitrary interaction model and exploiting powerful extreme gradient boosting trees (XGBoost). Target estimands for HTE are defined in the scale of mean difference for quantitative outcomes, or risk ratio for binary outcomes, which are the minimizers of specialized loss functions. The first stage is to estimate the main-effect equivalency of the baseline markers on the outcome, which is then used as an augmentation term in the second stage estimation for HTE. The proposed two-stage procedure is robust to model mis-specification of main effects and improves efficiency for estimating HTE through nonparametric function estimation, e.g., XGBoost. A permutation test is proposed for global assessment of evidence for HTE. An analysis of a genetic study in Prostate Cancer Prevention Trial led by the SWOG Cancer Research Network, is conducted to showcase the properties and the utilities of the two-stage method.
翻译:基于患者遗传或临床因素的异质性处理效应(HTE)对精准医学具有重要意义。在高维预测标记物的随机临床试验中同时建模HTE及其相应主效应具有挑战性。受修正协变量方法启发,我们提出了一种两阶段统计学习流程来估计具有最优效率增强的HTE,该方法可推广至任意交互作用模型,并利用强大的极端梯度提升树(XGBoost)。HTE的目标估计量定义为定量结局的平均差异尺度或二元结局的风险比尺度,这些估计量是特定损失函数的最小化值。第一阶段估计基线标记物对结局的主效应等价性,随后将其作为第二阶段HTE估计中的增强项。所提出的两阶段方法对主效应的模型误设具有鲁棒性,并通过非参数函数估计(如XGBoost)提高了HTE估计的效率。我们提出了一种置换检验来全局评估HTE的证据。通过分析SWOG癌症研究网络主导的前列腺癌预防试验中的一项遗传学研究,展示了该两阶段方法的特性和实用性。