Survival analysis is a widely used statistical framework for modeling time-to-event data under censoring. Classical methods, such as the Cox proportional hazards (Cox PH) model, offer a semiparametric approach to estimating the effects of covariates on the hazard function. Despite its importance, survival analysis has been largely unexplored in online settings, particularly within the bandit framework, where decisions must be made sequentially to optimize treatments as new data arrive over time. In this work, we take an initial step toward integrating survival analysis into a purely online learning setting under the Cox PH model, addressing key challenges including staggered entry, delayed feedback, and right censoring. We adapt three canonical bandit algorithms to balance exploration and exploitation, with theoretical guarantees of sublinear regret bounds. Extensive simulations and semi-real experiments using SEER cancer data demonstrate that our approach enables rapid and effective learning of near-optimal treatment policies.
翻译:生存分析是一种广泛使用的统计框架,用于在删失条件下对时间至事件数据进行建模。经典方法(如Cox比例风险模型)提供了一种半参数方法来估计协变量对风险函数的影响。尽管生存分析具有重要意义,但在在线环境中(特别是在赌博机框架下)尚未得到充分探索——在此类框架中,必须随着新数据的实时到达而顺序地做出决策以优化治疗方案。本研究初步尝试将生存分析整合到基于Cox比例风险模型的纯在线学习环境中,解决了交错进入、延迟反馈和右删失等关键挑战。我们改进了三种经典赌博机算法以平衡探索与利用,并提供了亚线性遗憾界限的理论保证。使用SEER癌症数据的大量模拟和半实验结果表明,我们的方法能够快速有效地学习近似最优的治疗策略。