Anemia is a prevalent medical condition that typically requires invasive blood tests for diagnosis and monitoring. Electronic health records (EHRs) have emerged as valuable data sources for numerous medical studies. EHR-based hemoglobin level/anemia degree prediction is non-invasive and rapid but still faces some challenges due to the fact that EHR data is typically an irregular multivariate time series containing a significant number of missing values and irregular time intervals. To address these issues, we introduce HgbNet, a machine learning-based prediction model that emulates clinicians' decision-making processes for hemoglobin level/anemia degree prediction. The model incorporates a NanDense layer with a missing indicator to handle missing values and employs attention mechanisms to account for both local irregularity and global irregularity. We evaluate the proposed method using two real-world datasets across two use cases. In our first use case, we predict hemoglobin level/anemia degree at moment T+1 by utilizing records from moments prior to T+1. In our second use case, we integrate all historical records with additional selected test results at moment T+1 to predict hemoglobin level/anemia degree at the same moment, T+1. HgbNet outperforms the best baseline results across all datasets and use cases. These findings demonstrate the feasibility of estimating hemoglobin levels and anemia degree from EHR data, positioning HgbNet as an effective non-invasive anemia diagnosis solution that could potentially enhance the quality of life for millions of affected individuals worldwide. To our knowledge, HgbNet is the first machine learning model leveraging EHR data for hemoglobin level/anemia degree prediction.
翻译:贫血是一种常见的医学病症,通常需要通过侵入性血液检测进行诊断和监测。电子健康记录(EHR)已成为众多医学研究的重要数据来源。基于EHR的血红蛋白水平/贫血程度预测具有无创、快速的优势,但仍面临若干挑战,因为EHR数据通常表现为包含大量缺失值和不规则时间间隔的不规则多变量时间序列。为解决这些问题,我们提出HgbNet——一种模仿临床医生决策过程的机器学习预测模型,用于血红蛋白水平/贫血程度预测。该模型引入带有缺失指示器的NanDense层处理缺失值,并采用注意力机制同时捕捉局部不规则性和全局不规则性。我们使用两个真实世界数据集在两个应用场景中评估该方法。在第一个应用场景中,我们利用T+1时刻之前的记录预测T+1时刻的血红蛋白水平/贫血程度。在第二个应用场景中,我们将所有历史记录与T+1时刻的额外选定检测结果相结合,预测同一时刻T+1的血红蛋白水平/贫血程度。在全部数据集和应用场景中,HgbNet均优于最佳基线结果。这些发现证明了从EHR数据估计血红蛋白水平和贫血程度的可行性,使HgbNet成为有效的无创贫血诊断方案,有望改善全球数百万患者的生活质量。据我们所知,HgbNet是首个利用EHR数据进行血红蛋白水平/贫血程度预测的机器学习模型。