Dynamic treatment rules or policies are a sequence of decision functions over multiple stages that are tailored to individual features. One important class of treatment policies for practice, namely multi-stage stationary treatment policies, prescribe treatment assignment probabilities using the same decision function over stages, where the decision is based on the same set of features consisting of both baseline variables (e.g., demographics) and time-evolving variables (e.g., routinely collected disease biomarkers). Although there has been extensive literature to construct valid inference for the value function associated with the dynamic treatment policies, little work has been done for the policies themselves, especially in the presence of high dimensional feature variables. We aim to fill in the gap in this work. Specifically, we first estimate the multistage stationary treatment policy based on an augmented inverse probability weighted estimator for the value function to increase the asymptotic efficiency, and further apply a penalty to select important feature variables. We then construct one-step improvement of the policy parameter estimators. Theoretically, we show that the improved estimators are asymptotically normal, even if nuisance parameters are estimated at a slow convergence rate and the dimension of the feature variables increases exponentially with the sample size. Our numerical studies demonstrate that the proposed method has satisfactory performance in small samples, and that the performance can be improved with a choice of the augmentation term that approximates the rewards or minimizes the variance of the value function.
翻译:动态治疗规则或策略是一系列针对个体特征跨阶段定制的决策函数。实践中一类重要的治疗策略——多阶段平稳治疗策略,使用相同的决策函数跨阶段规定治疗分配概率,该决策基于包含基线变量(如人口统计学特征)和时变变量(如常规收集的疾病生物标志物)的同一组特征。尽管已有大量文献为动态治疗策略相关的价值函数构建有效推断,但针对策略本身的研究仍十分有限,尤其是当存在高维特征变量时。本研究旨在填补这一空白。具体而言,我们首先基于价值函数的增广逆概率加权估计量来估计多阶段平稳治疗策略,以提高渐近效率,并进一步引入惩罚项以选择重要特征变量。随后构建策略参数估计量的一步改进。理论上,我们证明即使干扰参数以较慢收敛速度估计且特征变量维度随样本量呈指数增长,改进后的估计量仍具有渐近正态性。数值研究表明,所提方法在小样本下表现良好,且可通过选择逼近奖励或最小化价值函数方差的增广项来进一步提升性能。