Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.
翻译:电子健康记录(EHR)包含丰富的患者数据;然而,EHR数据的稀疏性常常给预测建模带来重大挑战。传统的插补方法难以区分真实数据与插补数据,可能导致患者表征的潜在不准确性。为解决这些问题,我们提出了PRISM框架,该框架通过利用相似患者的原型表示间接插补数据,从而确保生成能够保留患者信息的紧凑表征。PRISM还包含一个特征置信度学习器模块,该模块结合缺失状态评估每个特征的可靠性。此外,PRISM引入了一种新的患者相似性度量方法,该方法综合考虑特征置信度,避免对不精确插补值的过度依赖。我们在MIMIC-III、MIMIC-IV、PhysioNet Challenge 2012和eICU数据集上进行的广泛实验表明,PRISM在预测院内死亡率和30天再入院任务中表现出优越性能,验证了其处理EHR数据稀疏性的有效性。为促进可重复性及后续研究,我们已将代码公开于https://github.com/yhzhu99/PRISM。