In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions.
翻译:在快速发展的医疗健康行业中,平台如今不仅能够获取传统医疗记录,还可整合涵盖患者多种交互行为(如医疗门户网站访问记录)的多样化数据集。为应对这种数据丰富性,我们提出WellFactor方法:通过整合多源信息生成患者画像。本方法的核心在于采用约束低秩近似技术。WellFactor针对医疗数据中普遍存在的稀疏性问题进行了优化设计。此外,通过引入任务特定标签信息,该方法能够优化嵌入结果,为患者提供更具洞察力的分析视角。其重要特性之一在于能对新出现的未见患者数据实现即时嵌入计算,无需重新遍历整个数据集或重计算嵌入结果。基于真实医疗数据的综合评估表明,WellFactor在分类性能上优于现有方法,能生成有临床意义的患者聚类,并在患者相似性搜索与预测任务中保持稳定表现。