Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.
翻译:电子健康记录(EHRs)是推动医疗领域人工智能(AI)变革的重要数据来源。然而,EHR记录中反映的临床医生偏见可能导致AI模型继承并放大这些偏见,从而加剧健康差异。本研究采用基于Transformer的深度学习模型和可解释AI(XAI)技术,探究EHR记录中污名化语言(SL)对死亡率预测的影响。研究结果表明,临床医生书写的SL内容会对AI性能产生不利影响,尤其是对黑人患者而言,这凸显了SL在AI模型开发中作为种族差异来源的作用。为探究降低SL影响的操作高效方法,我们通过临床医生协作网络分析SL的产生模式,发现中心临床医生对AI模型中的种族差异影响更大。研究表明,仅删除中心临床医生书写的SL内容,其偏差减少效率高于删除整个数据集中所有SL内容。本研究为负责任AI开发提供了可操作见解,并有助于深入理解临床医生行为及医疗领域的EHR记录撰写。