Machine learning (ML) has become a critical tool in public health, offering the potential to improve population health, diagnosis, treatment selection, and health system efficiency. However, biases in data and model design can result in disparities for certain protected groups and amplify existing inequalities in healthcare. To address this challenge, this study summarizes seminal literature on ML fairness and presents a framework for identifying and mitigating biases in the data and model. The framework provides guidance on incorporating fairness into different stages of the typical ML pipeline, such as data processing, model design, deployment, and evaluation. To illustrate the impact of biases in data on ML models, we present examples that demonstrate how systematic biases can be amplified through model predictions. These case studies suggest how the framework can be used to prevent these biases and highlight the need for fair and equitable ML models in public health. This work aims to inform and guide the use of ML in public health towards a more ethical and equitable outcome for all populations.
翻译:机器学习(ML)已成为公共卫生领域的关键工具,具有改善人群健康、诊断、治疗选择和卫生系统效率的潜力。然而,数据和模型设计中的偏见可能导致特定受保护群体面临差异,并放大医疗保健中已有的不平等现象。为应对这一挑战,本研究总结了关于机器学习公平性的重要文献,并提出了一个识别和缓解数据与模型偏见的框架。该框架为指导如何在典型机器学习流水线的不同阶段(如数据处理、模型设计、部署和评估)融入公平性提供了指南。为说明数据偏见对机器学习模型的影响,我们通过实例展示了系统性偏见如何通过模型预测被放大。这些案例研究表明了如何利用该框架预防这些偏见,并凸显了在公共卫生领域构建公平公正的机器学习模型的必要性。本研究旨在为公共卫生领域中的机器学习应用提供信息与指导,以推动实现面向所有人群的更道德、更公平的结果。