In healthcare, there is much interest in estimating policies, or mappings from covariates to treatment decisions. Recently, there is also interest in constraining these estimated policies to the standard of care, which generated the observed data. A relative sparsity penalty was proposed to derive policies that have sparse, explainable differences from the standard of care, facilitating justification of the new policy. However, the developers of this penalty only considered estimation, not inference. Here, we develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine. Further, in the relative sparsity work, the authors only considered the single-stage decision case; here, we consider the more general, multi-stage case. Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case. Further, one must deal with a non-differentiable penalty. To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference. We study the asymptotic behavior of our proposed approaches, perform extensive simulations, and analyze a real, electronic health record dataset.
翻译:在医疗健康领域,从协变量到治疗决策的映射策略(或称策略)的估计备受关注。近年来,研究者亦关注如何将这些估计策略约束至产生观测数据的现行诊疗标准。相对稀疏性惩罚被提出,用于推导与现行诊疗标准具有稀疏、可解释差异的策略,从而为新策略的合理性论证提供便利。然而,该惩罚方法的提出者仅考虑了估计问题,未涉及统计推断。本文针对相对稀疏性目标函数发展了推断方法,因为刻画不确定性在医学应用中至关重要。此外,在相对稀疏性研究中,原作者仅考虑了单阶段决策情形;本文则考虑了更一般的多阶段情形。推断面临困难,因为相对稀疏性目标函数依赖于未惩罚的价值函数,而该函数在二元行动情形下具有不稳定性且存在无限多估计量。此外,还需处理不可微的惩罚项。为解决这些问题,我们将加权的信任域策略优化函数嵌套于相对稀疏性目标中,实现了自适应相对稀疏性惩罚,并提出了用于后选择推断的样本分割框架。我们研究了所提方法的渐近性质,进行了广泛的模拟实验,并分析了一个真实的电子健康记录数据集。