Electronic Health Record (EHR) data, while rich in information, often suffers from sparsity, posing significant challenges in predictive modeling. Traditional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies in models. Addressing this, we introduce PRISM, a framework that indirectly imputes data through prototype representations of similar patients, thus ensuring denser and more accurate embeddings. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature in light of missing data. Additionally, it incorporates a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM 's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.
翻译:电子健康记录数据虽信息丰富,却常存在稀疏性问题,给预测建模带来重大挑战。传统插补方法难以有效区分真实数据与插补数据,可能导致模型准确性不足。针对这一问题,我们提出PRISM框架,通过相似患者的原型表示间接完成数据插补,从而生成更密集、更精确的嵌入表示。该框架还包含特征置信度学习模块,可评估每个特征在数据缺失背景下的可靠性。此外,PRISM引入考虑特征置信度的新型患者相似度度量,避免过度依赖不精确的插补值。我们在MIMIC-III、MIMIC-IV、PhysioNet Challenge 2012及eICU数据集上的大量实验表明,PRISM在院内死亡率和30天再入院预测任务中均展现出优越性能,验证了其在处理EHR数据稀疏性方面的有效性。为确保研究可复现性并促进后续研究,我们已在https://github.com/yhzhu99/PRISM公开代码。