An Overview of Modern Machine Learning Methods for Effect Measure Modification Analyses in High-Dimensional Settings

A primary concern of public health researchers involves identifying and quantifying heterogeneous exposure effects across population subgroups. Understanding the magnitude and direction of these effects on a given scale provides researchers the ability to recommend policy prescriptions and assess the external validity of findings. Furthermore, increasing popularity in fields such as precision medicine that rely on accurate estimation of high-dimensional interaction effects has highlighted the importance of understanding effect modification. Traditional methods for effect measure modification analyses include parametric regression modeling with either stratified analyses and corresponding heterogeneity tests or including an interaction term in a multivariable model. However, these methods require manual model specification and are often impractical or not feasible to conduct by hand in high-dimensional settings. Recent developments in machine learning aim to solve this issue by automating heterogeneous subgroup identification and effect estimation. In this paper, we summarize and provide the intuition behind modern machine learning methods for effect measure modification analyses to serve as a reference for public health researchers. We discuss their implementation in R, provide annotated syntax and review available supplemental analysis tools by assessing the heterogeneous effects of drought on stunting among children in the Demographic and Health Survey data set as a case study.

翻译：公共卫生研究者的首要关注点之一在于识别和量化不同人群亚组间的暴露效应异质性。理解这些效应在特定尺度上的大小与方向，使研究者能够提出政策建议并评估研究结果的外部效度。此外，精准医学等依赖高维交互效应精确估计的领域日益普及，凸显了理解效应修正的重要性。传统的效应测量修正分析方法包括参数回归建模（采用分层分析及相应异质性检验，或在多变量模型中纳入交互项）。然而，这些方法需要手动设定模型，在高维设定下往往不切实际或无法手动操作。近期机器学习的发展旨在通过自动化识别异质性亚组和估计效应来解决这一问题。本文总结了现代机器学习方法在效应测量修正分析中的直觉理解，以作为公共卫生研究者的参考。我们讨论了这些方法在R语言中的实现，提供了带注释的语法，并以干旱对人口与健康调查数据集中儿童发育迟缓的异质性效应为案例研究，回顾了可用的辅助分析工具。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日