We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes from right censored survival outcomes in longitudinal randomized clinical trials, motivated by the clinical need to support patient specific treatment decisions when follow up is incomplete and covariate effects may be nonlinear. The method combines accelerated failure time modeling with iterative boosting using flexible base learners, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By modeling conditional survival time directly, BJ Boost Q learning avoids the proportional hazards assumption, yields clinically interpretable time scale contrasts, and enables estimation of stage specific Q functions and individualized decision rules under standard potential outcomes assumptions. In contrast to Cox based Q learning, which relies on hazard modeling and can be sensitive to nonproportional hazards and model misspecification, our approach provides a robust and flexible alternative for regime learning. Simulation studies and analyses of the ACTG175 HIV trial and the CALGB 8923 two stage leukemia trial show that BJ Boost Q learning improves treatment decision accuracy and produces more stable within participant counterfactual contrasts, particularly in multistage settings where estimation error and bias can compound across stages.
翻译:我们提出了一种Buckley-James(BJ)Boosting Q学习框架,用于从纵向随机临床试验的右删失生存结局中估计最优动态治疗方案。该研究动机源于临床需求:在随访不完整且协变量效应可能非线性的情况下,需要支持患者个体化的治疗决策。该方法将加速失效时间建模与迭代Boosting相结合,在反事实Q学习框架内使用灵活的基础学习器(包括分量最小二乘和回归树)。通过直接建模条件生存时间,BJ Boosting Q学习避免了比例风险假设,提供了临床可解释的时间尺度对比,并能够在标准潜在结果假设下估计阶段特异性Q函数及个体化决策规则。与基于Cox模型的Q学习方法相比(后者依赖风险建模,且对非比例风险及模型误设较为敏感),我们的方法为治疗方案学习提供了一种稳健且灵活的替代方案。仿真研究及对ACTG175 HIV试验与CALGB 8923两阶段白血病试验的分析表明,BJ Boosting Q学习能提升治疗决策的准确性,并产生更稳定的个体内反事实对比,尤其在多阶段场景中(该场景下估计误差与偏倚可能随阶段累积)。