Treatment effect estimation (TEE) is the task of determining the impact of various treatments on patient outcomes. Current TEE methods fall short due to reliance on limited labeled data and challenges posed by sparse and high-dimensional observational patient data. To address the challenges, we introduce a novel pre-training and fine-tuning framework, KG-TREAT, which synergizes large-scale observational patient data with biomedical knowledge graphs (KGs) to enhance TEE. Unlike previous approaches, KG-TREAT constructs dual-focus KGs and integrates a deep bi-level attention synergy method for in-depth information fusion, enabling distinct encoding of treatment-covariate and outcome-covariate relationships. KG-TREAT also incorporates two pre-training tasks to ensure a thorough grounding and contextualization of patient data and KGs. Evaluation on four downstream TEE tasks shows KG-TREAT's superiority over existing methods, with an average improvement of 7% in Area under the ROC Curve (AUC) and 9% in Influence Function-based Precision of Estimating Heterogeneous Effects (IF-PEHE). The effectiveness of our estimated treatment effects is further affirmed by alignment with established randomized clinical trial findings.
翻译:治疗效果估计(TEE)旨在确定不同治疗对患者结局的影响。现有TEE方法受限于有限标注数据和稀疏高维观测患者数据的挑战。为解决这些问题,我们提出一种新型预训练-微调框架KG-TREAT,通过将大规模观测患者数据与生物医学知识图谱(KG)协同融合来增强TEE性能。与先前方法不同,KG-TREAT构建双焦点知识图谱,并集成深度双层注意力协同方法实现深度信息融合,从而对治疗-协变量和结局-协变量关系进行差异化编码。该框架还设计了两项预训练任务,确保患者数据和知识图谱的全面基础化与情境化。在四个下游TEE任务上的评估表明,KG-TREAT显著优于现有方法,受试者工作特征曲线下面积(AUC)平均提升7%,基于影响函数的异质性效应估计精度(IF-PEHE)平均提升9%。经与已建立的随机临床试验结果对比,进一步验证了所估计治疗效果的可靠性。