Valid post-selection inference for penalized G-estimation

Understanding treatment effect heterogeneity is important for decision making in medical and clinical practices, or handling various engineering and marketing challenges. When dealing with high-dimensional covariates or when the effect modifiers are not predefined and need to be discovered, data-adaptive selection approaches become essential. However, with data-driven model selection, the quantification of statistical uncertainty is complicated by post-selection inference due to difficulties in approximating the sampling distribution of the target estimator. Data-driven model selection tends to favor models with strong effect modifiers with an associated cost of inflated type I errors. Although several frameworks and methods for valid statistical inference have been proposed for ordinary least squares regression following data-driven model selection, fewer options exist for valid inference for effect modifier discovery in causal modeling contexts. In this article, we extend two different methods to develop valid inference for penalized G-estimation that investigates effect modification of proximal treatment effects within the structural nested mean model framework. We show the asymptotic validity of the proposed methods. Using extensive simulation studies, we evaluate and compare the finite sample performance of the proposed methods and the naive inference based on a sandwich variance estimator. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Université de Montréal. We apply these methods to draw inference about the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.

翻译：理解治疗效应异质性对于医学和临床实践中的决策制定，以及处理各种工程和营销挑战具有重要意义。当处理高维协变量或当效应修饰因子未预先定义而需要被发现时，数据自适应选择方法变得至关重要。然而，在数据驱动模型选择下，由于难以近似目标估计量的抽样分布，后选择推断使得统计不确定性的量化复杂化。数据驱动模型选择倾向于偏好具有强效应修饰因子的模型，但伴随I类错误膨胀的代价。尽管针对数据驱动模型选择后的普通最小二乘回归已提出多种有效统计推断框架和方法，但在因果建模背景下，用于效应修饰因子发现的有效推断选项较少。本文扩展了两种不同方法，以开发基于惩罚G估计的有效推断，该估计用于研究结构嵌套均值模型框架内近端治疗效应的效应修饰。我们证明了所提方法的渐近有效性。通过广泛的模拟研究，我们评估并比较了所提方法与基于夹逼方差估计量的朴素推断的有限样本性能。本研究受蒙特利尔大学中心医院采用血液透析滤过治疗终末期肾病患者的临床研究启发。我们应用这些方法推断透析设施对重复会话特定血液透析滤过结果的效应异质性。