Data-Adaptive Identification of Effect Modifiers through Stochastic Shift Interventions and Cross-Validated Targeted Learning

In epidemiology, identifying subpopulations that are particularly vulnerable to exposures and those who may benefit differently from exposure-reducing interventions is essential. Factors such as age, gender-specific vulnerabilities, and physiological states such as pregnancy are critical for policymakers when setting regulatory guidelines. However, current semi-parametric methods for estimating heterogeneous treatment effects are often limited to binary exposures and can function as black boxes, lacking clear, interpretable rules for subpopulation-specific policy interventions. This study introduces a novel method that uses cross-validated targeted minimum loss-based estimation (TMLE) paired with a data-adaptive target parameter strategy to identify subpopulations with the most significant differential impact of simulated policy interventions that reduce exposure. Our approach is assumption-lean, allowing for the integration of machine learning while still yielding valid confidence intervals. We demonstrate the robustness of our methodology through simulations and application to data from the National Health and Nutrition Examination Survey. Our analysis of NHANES data on persistent organic pollutants (POPs) and leukocyte telomere length (LTL) identified age as a significant effect modifier. Specifically, we found that exposure to 3,3',4,4',5-pentachlorobiphenyl (PCNB) consistently had a differential impact on LTL, with a one-standard deviation reduction in exposure leading to a more pronounced increase in LTL among younger populations compared to older ones. We offer our method as an open-source software package, EffectXshift, enabling researchers to investigate the effect modification of continuous exposures. The EffectXshift package provides clear and interpretable results, informing targeted public health interventions and policy decisions.

翻译：在流行病学研究中，识别对暴露特别敏感的亚人群以及可能从减少暴露干预中获益程度不同的群体至关重要。年龄、性别特异性易感性以及妊娠等生理状态等因素，对政策制定者设定监管指南具有关键意义。然而，当前用于估计异质性处理效应的半参数方法通常局限于二元暴露，且可能作为黑箱运行，缺乏针对特定亚群体政策干预的清晰可解释规则。本研究提出一种创新方法，该方法结合交叉验证目标最小损失估计（TMLE）与数据自适应目标参数策略，以识别在模拟减少暴露的政策干预下受到最显著差异影响的亚群体。我们的方法假设条件宽松，允许整合机器学习技术，同时仍能产生有效的置信区间。我们通过模拟实验及对美国国家健康与营养调查（NHANES）数据的应用，证明了所提方法的稳健性。通过对NHANES中持久性有机污染物（POPs）与白细胞端粒长度（LTL）数据的分析，我们识别出年龄是一个显著的效应修饰因子。具体而言，研究发现3,3',4,4',5-五氯联苯（PCNB）暴露对LTL始终存在差异影响：暴露每降低一个标准差，在年轻群体中引起的LTL上升幅度显著高于年长群体。我们将该方法开发为开源软件包EffectXshift，使研究人员能够探究连续暴露的效应修饰作用。EffectXshift软件包提供清晰可解释的结果，为针对性公共卫生干预和政策决策提供依据。