Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models

Objectives: Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data. Methods: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 1, 2010, and Dec 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development process, and analyzed metrics for bias assessment. Results: Of the 450 articles retrieved, 20 met our criteria, revealing six major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks in healthcare settings. Four studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Sixty proposed various strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance (e.g., accuracy, AUROC) and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling, reweighting, and transformation. Discussion: This review highlights the varied and evolving nature of strategies to address bias in EHR-based AI models, emphasizing the urgent needs for the establishment of standardized, generalizable, and interpretable methodologies to foster the creation of ethical AI systems that promote fairness and equity in healthcare.

翻译：目的：将人工智能与电子健康记录相结合具有改善医疗保健的变革潜力。然而，人工智能中存在的偏见可能加剧医疗不平等，这一问题不容忽视。本研究系统综述了基于电子健康记录数据开发的人工智能模型中各类偏见的检测与缓解方法。方法：我们遵循系统综述和荟萃分析优先报告项目指南，对2010年1月1日至2023年12月17日期间发表于PubMed、Web of Science和IEEE的文献进行了系统性综述。该综述识别了关键偏见类型，概述了人工智能模型开发全过程中检测与缓解偏见的策略，并分析了偏见评估指标。结果：在检索到的450篇文献中，20篇符合纳入标准，揭示了六种主要偏见类型：算法偏见、混杂偏见、隐性偏见、测量偏见、选择偏见和时间偏见。这些人工智能模型主要开发用于医疗场景的预测任务。四项研究专注于通过统计平等、机会均等和预测公平等公平性指标检测隐性与算法偏见。六十项研究提出了多种缓解偏见的策略，尤其针对隐性与选择偏见。这些策略通过性能指标和公平性指标进行评估，主要涉及数据收集与预处理技术，如重采样、重加权和变换。讨论：本综述揭示了基于电子健康记录的人工智能模型偏见应对策略的多样性与演进性，强调亟需建立标准化、可推广且可解释的方法论，以促进构建符合伦理、能推动医疗公平与公正的人工智能系统。