Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

The Targeted Maximum Likelihood Estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarised the epidemiological discipline, geographical location, expertise of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. Of the 81 publications included, 25% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By the first half of 2022, 70% of the publications originated from outside the United States and explored up to 7 different epidemiological disciplines in 2021-22. Double-robustness, bias reduction and model misspecification were the main motivations that drew researchers towards the TMLE framework. Through time, a wide variety of methodological, tutorial and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits, and adoption, of TMLE.

翻译：目标最大似然估计（TMLE）统计数据分析框架整合了机器学习、统计理论与统计推断，为多种统计参数和因果参数的估计与推断提供了偏倚最小、高效且稳健的策略。本文描述并评估了得益于最新方法学发展的流行病学应用。我们在PubMed中对应用任何形式TMLE的观察性研究进行了系统性文献综述。我们总结了流行病学科领域、地理位置、作者专业背景以及不同时期的TMLE方法。参照目标学习与因果推断路线图，提取了出版物中的关键方法学要素，并展示了这些TMLE研究对文献的贡献。在纳入的81篇出版物中，25%来自该框架的首创者马克·范德兰教授所在的加利福尼亚大学伯克利分校。截至2022年上半年，70%的出版物来自美国境外，且在2021-2022年间涉及多达7个不同的流行病学科领域。双重稳健性、偏倚减少和模型设定错误是吸引研究者采用TMLE框架的主要动机。随着TMLE方法学研究的持续发展，大量涉及方法学、教程及特定软件的文章被引用。TMLE框架正呈现出向多学科领域及更多地理区域扩散的明显趋势。R语言软件包的可得性、教程论文的发表以及方法学专家参与应用研究，共同促进了越来越多研究对TMLE优势的理解与采纳，使其应用数量呈指数级增长。