Dynamic treatment regimes or policies are a sequence of decision functions over multiple stages that are tailored to individual features. One important class of treatment policies in practice, namely multi-stage stationary treatment policies, prescribes treatment assignment probabilities using the same decision function across stages, where the decision is based on the same set of features consisting of time-evolving variables (e.g., routinely collected disease biomarkers). Although there has been extensive literature on constructing valid inference for the value function associated with dynamic treatment policies, little work has focused on the policies themselves, especially in the presence of high-dimensional feature variables. We aim to fill the gap in this work. Specifically, we first estimate the multi-stage stationary treatment policy using an augmented inverse probability weighted estimator for the value function to increase asymptotic efficiency, and further apply a penalty to select important feature variables. We then construct one-step improvements of the policy parameter estimators for valid inference. Theoretically, we show that the improved estimators are asymptotically normal, even if nuisance parameters are estimated at a slow convergence rate and the dimension of the feature variables increases with the sample size. Our numerical studies demonstrate that the proposed method estimates a sparse policy with a near-optimal value function and conducts valid inference for the policy parameters.
翻译:动态治疗策略或治疗方案是一系列针对个体特征的多阶段决策函数。实践中一类重要的治疗策略——多阶段平稳治疗策略——通过跨阶段使用相同的决策函数来规定治疗分配概率,其中决策基于由时变变量(例如常规收集的疾病生物标志物)构成的相同特征集。尽管已有大量文献针对动态治疗策略的价值函数构建有效推断,但很少有研究聚焦于策略本身,特别是在高维特征变量的情况下。本研究旨在填补这一空白。具体而言,我们首先使用价值函数的增广逆概率加权估计量来估计多阶段平稳治疗策略以提高渐近效率,并进一步施加惩罚项以选择重要特征变量。随后,我们构建策略参数估计量的一步改进形式以实现有效推断。理论上,我们证明了即使干扰参数以较慢的收敛速率被估计,且特征变量的维度随样本量增长,改进后的估计量仍具有渐近正态性。数值研究表明,所提方法能够估计具有接近最优价值函数的稀疏策略,并对策略参数进行有效推断。