In healthcare, there is much interest in estimating policies, or mappings from covariates to treatment decisions. Recently, there is also interest in constraining these estimated policies to the standard of care, which generated the observed data. A relative sparsity penalty was proposed to derive policies that have sparse, explainable differences from the standard of care, facilitating justification of the new policy. However, the developers of this penalty only considered estimation, not inference. Here, we develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine. Further, in the relative sparsity work, the authors only considered the single-stage decision case; here, we consider the more general, multi-stage case. Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case. Further, one must deal with a non-differentiable penalty. To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference. We study the asymptotic behavior of our proposed approaches, perform extensive simulations, and analyze a real, electronic health record dataset.
翻译:在医疗保健领域,估计策略(即从协变量到治疗决策的映射)引起了广泛关注。近年来,人们还希望将这些估计策略约束为与生成观测数据的标准治疗方案保持一致。为此,研究者提出了一种相对稀疏性惩罚函数,旨在推导出与标准治疗方案相比具有稀疏、可解释差异的策略,从而有助于新策略的论证。然而,该惩罚函数的开发者仅考虑了估计问题,未涉及推断。本文旨在为相对稀疏性目标函数建立推断方法,因为描述不确定性对于医学应用至关重要。此外,在相对稀疏性的工作中,作者仅考虑了单阶段决策情形;本文则进一步探讨更一般的多阶段情形。推断面临诸多困难:相对稀疏性目标依赖于未经惩罚的估值函数,该函数在二元动作情形下不稳定且具有无限估计目标;同时还需处理不可微的惩罚项。为解决这些问题,我们将加权信任区域策略优化函数嵌套于相对稀疏性目标中,实施自适应相对稀疏性惩罚,并提出一种用于事后选择推断的样本分割框架。本文研究了所提方法的渐近性质,进行了大量仿真实验,并分析了一个真实的电子健康病历数据集。