This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the Intensive Care Unit (ICU) length of stay (LOS). Highlighting the critical role of the ICU in managing critically ill patients, the study addresses the growing strain on ICU capacity. It emphasizes the significance of LOS prediction for resource allocation. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. While the XGBoost model performs well overall, disparities across race and insurance attributes reflect the need for tailored assessments and continuous monitoring. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
翻译:本文利用MIMIC-IV数据集,探究XGBoost二分类模型在预测重症监护室(ICU)住院时间(LOS)时的公平性与偏差。研究聚焦ICU在危重症患者管理中的关键作用,同时应对ICU容量日益紧张的现实问题,强调了住院时间预测对资源分配的重要意义。研究发现数据集在人口统计学属性上存在类别不平衡问题,并采用了数据预处理与特征提取方法。尽管XGBoost模型整体表现良好,但在种族与保险属性维度上显现的差异表明,需进行针对性评估与持续监测。本文最后提出采用公平感知机器学习技术以缓解偏差的建议,并强调医疗专业人员与数据科学家开展协作的必要性。