In the age of digital epidemiology, epidemiologists are faced by an increasing amount of data of growing complexity and dimensionality. Machine learning is a set of powerful tools that can help to analyze such enormous amounts of data. This chapter lays the methodological foundations for successfully applying machine learning in epidemiology. It covers the principles of supervised and unsupervised learning and discusses the most important machine learning methods. Strategies for model evaluation and hyperparameter optimization are developed and interpretable machine learning is introduced. All these theoretical parts are accompanied by code examples in R, where an example dataset on heart disease is used throughout the chapter.
翻译:在数字流行病学时代,流行病学家面临着日益复杂和高维的海量数据。机器学习是一套强大的工具,能够帮助分析此类海量数据。本章为在流行病学中成功应用机器学习奠定了方法论基础。涵盖了监督学习和无监督学习的原理,并讨论了最重要的机器学习方法。提出了模型评估和超参数优化的策略,并介绍了可解释机器学习。所有理论部分均配有R语言代码示例,全章使用心脏病数据集作为示例贯穿始终。