Electronic Health Record (EHR) data, while rich in information, often suffers from sparsity, posing significant challenges in predictive modeling. Traditional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies in models. Addressing this, we introduce PRISM, a novel approach that indirectly imputes data through prototype representations of similar patients, thus ensuring denser and more accurate embeddings. PRISM innovates further with a feature confidence learner module, which evaluates the reliability of each feature in light of missing data. Additionally, it incorporates a novel patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.
翻译:电子健康记录(EHR)数据虽信息丰富,但常存在稀疏性问题,这给预测建模带来了重大挑战。传统的插补方法无法充分区分真实数据与插补数据,导致模型可能产生不准确的结果。为解决这一问题,我们提出PRISM——一种通过相似患者的原型表征间接插补数据的新方法,从而确保嵌入结果更密集且更准确。PRISM进一步创新性地引入特征置信度学习模块,可在数据缺失背景下评估每个特征的可靠性。此外,该方法还纳入一种考虑特征置信度的新患者相似性度量,避免过度依赖不精确的插补值。我们在MIMIC-III和MIMIC-IV数据集上的大量实验表明,PRISM在院内死亡率和30天再入院预测任务中表现出卓越性能,展示了其在处理EHR数据稀疏性方面的有效性。为便于复现和后续研究,我们已将代码公开于https://github.com/yhzhu99/PRISM。