Valid post-selection inference for penalized G-estimation

Understanding treatment effect heterogeneity is important for decision making in medical and clinical practices, or handling various engineering and marketing challenges. When dealing with high-dimensional covariates or when the effect modifiers are not predefined and need to be discovered, data-adaptive selection approaches become essential. However, with data-driven model selection, the quantification of statistical uncertainty is complicated by post-selection inference due to difficulties in approximating the sampling distribution of the target estimator. Data-driven model selection tends to favor models with strong effect modifiers with an associated cost of inflated type I errors. Although several frameworks and methods for valid statistical inference have been proposed for ordinary least squares regression following data-driven model selection, fewer options exist for valid inference for effect modifier discovery in causal modeling contexts. In this article, we extend two different methods to develop valid inference for penalized G-estimation that investigates effect modification of proximal treatment effects within the structural nested mean model framework. We show the asymptotic validity of the proposed methods. Using extensive simulation studies, we evaluate and compare the finite sample performance of the proposed methods and the naive inference based on a sandwich variance estimator. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Université de Montréal. We apply these methods to draw inference about the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.

翻译：理解治疗效应异质性对于医学和临床实践中的决策制定，以及处理各种工程和营销挑战具有重要意义。当处理高维协变量或效应修饰因子未预先定义而需要被发现时，数据自适应选择方法变得至关重要。然而，在数据驱动模型选择下，由于难以近似目标估计量的抽样分布，后选择推断使统计不确定性的量化变得复杂。数据驱动模型选择倾向于偏好具有强效应修饰因子的模型，但伴随I类错误膨胀的成本。尽管在数据驱动模型选择后的普通最小二乘回归中，已有多种有效统计推断框架和方法被提出，但在因果建模背景下，用于效应修饰因子发现的有效推断选项较少。在本文中，我们扩展了两种不同方法，以开发针对惩罚性G估计的有效推断，该估计在结构嵌套均值模型框架内研究近端治疗效应的效应修饰。我们证明了所提出方法的渐近有效性。通过广泛的模拟研究，我们评估并比较了所提出方法与基于夹心方差估计器的朴素推断的有限样本性能。我们的工作受蒙特利尔大学中心医院对终末期肾病患者进行血液透析滤过治疗的研究启发。我们将这些方法应用于推断透析设施对重复性特定疗程血液透析滤过结果效应异质性的影响。