A primary concern of public health researchers involves identifying and quantifying heterogeneous exposure effects across population subgroups. Understanding the magnitude and direction of these effects on a given scale provides researchers the ability to recommend policy prescriptions and assess the external validity of findings. Furthermore, increasing popularity in fields such as precision medicine that rely on accurate estimation of high-dimensional interaction effects has highlighted the importance of understanding effect modification. Traditional methods for effect measure modification analyses include parametric regression modeling with either stratified analyses and corresponding heterogeneity tests or including an interaction term in a multivariable model. However, these methods require manual model specification and are often impractical or not feasible to conduct by hand in high-dimensional settings. Recent developments in machine learning aim to solve this issue by automating heterogeneous subgroup identification and effect estimation. In this paper, we summarize and provide the intuition behind modern machine learning methods for effect measure modification analyses to serve as a reference for public health researchers. We discuss their implementation in R, provide annotated syntax and review available supplemental analysis tools by assessing the heterogeneous effects of drought on stunting among children in the Demographic and Health Survey data set as a case study.
翻译:公共卫生研究者的首要关注点之一在于识别和量化不同人群亚组间的暴露效应异质性。理解这些效应在特定尺度上的大小与方向,使研究者能够提出政策建议并评估研究结果的外部效度。此外,精准医学等依赖高维交互效应精确估计的领域日益普及,凸显了理解效应修正的重要性。传统的效应测量修正分析方法包括参数回归建模(采用分层分析及相应异质性检验,或在多变量模型中纳入交互项)。然而,这些方法需要手动设定模型,在高维设定下往往不切实际或无法手动操作。近期机器学习的发展旨在通过自动化识别异质性亚组和估计效应来解决这一问题。本文总结了现代机器学习方法在效应测量修正分析中的直觉理解,以作为公共卫生研究者的参考。我们讨论了这些方法在R语言中的实现,提供了带注释的语法,并以干旱对人口与健康调查数据集中儿童发育迟缓的异质性效应为案例研究,回顾了可用的辅助分析工具。