Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models

Objectives: Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data. Methods: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 1, 2010, and Dec 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development process, and analyzed metrics for bias assessment. Results: Of the 450 articles retrieved, 20 met our criteria, revealing six major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks in healthcare settings. Four studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Sixty proposed various strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance (e.g., accuracy, AUROC) and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling, reweighting, and transformation. Discussion: This review highlights the varied and evolving nature of strategies to address bias in EHR-based AI models, emphasizing the urgent needs for the establishment of standardized, generalizable, and interpretable methodologies to foster the creation of ethical AI systems that promote fairness and equity in healthcare.

翻译：目的：将人工智能（AI）与电子健康记录（EHR）相结合，有望变革并改善医疗保健。然而，解决AI中的偏见问题不容忽视，因其可能加剧医疗保健领域的不平等。本研究综述了基于EHR数据开发的AI模型中各类偏见的检测与缓解方法。方法：我们遵循系统综述和Meta分析优先报告条目（PRISMA）指南，系统分析了2010年1月1日至2023年12月17日期间来自PubMed、Web of Science和IEEE的文章。综述识别了关键偏见类型，概述了在AI模型开发全过程中检测和缓解偏见的策略，并分析了偏见评估指标。结果：在检索到的450篇文章中，20篇符合纳入标准，揭示了六类主要偏见：算法偏见、混杂偏见、隐性偏见、测量偏见、选择偏见和时间偏见。AI模型主要开发用于医疗场景中的预测任务。四项研究聚焦于隐性偏见和算法偏见的检测，采用统计均等、机会均等和预测平权等公平性指标。六十项研究提出了多种偏见缓解策略，尤其针对隐性偏见和选择偏见。这些策略通过性能指标（如准确率、AUROC）和公平性指标进行评估，主要涉及数据收集和预处理技术，如重采样、重加权和变换。讨论：本综述凸显了针对EHR模型偏见缓解策略的多样性和演进性，强调了建立标准化、可泛化且可解释方法的迫切需求，以促进构建促进医疗公平与平等的伦理AI系统。