Knowledge graphs (KGs) represent connections and relationships between real-world entities. We propose a link prediction framework for KGs named Enrichment-Driven GrAph Reasoner (EDGAR), which infers new edges by mining entity-local rules. This approach leverages enrichment analysis, a well-established statistical method used to identify mechanisms common to sets of differentially expressed genes. EDGAR's inference results are inherently explainable and rankable, with p-values indicating the statistical significance of each enrichment-based rule. We demonstrate the framework's effectiveness on a large-scale biomedical KG, ROBOKOP, focusing on drug repurposing for Alzheimer disease (AD) as a case study. Initially, we extracted 14 known drugs from the KG and identified 20 contextual biomarkers through enrichment analysis, revealing functional pathways relevant to shared drug efficacy for AD. Subsequently, using the top 1000 enrichment results, our system identified 1246 additional drug candidates for AD treatment. The top 10 candidates were validated using evidence from medical literature. EDGAR is deployed within ROBOKOP, complete with a web user interface. This is the first study to apply enrichment analysis to large graph completion and drug repurposing.
翻译:知识图谱(KGs)表征现实世界实体间的联系与关系。我们提出了一种名为富集驱动图推理器(EDGAR)的知识图谱链接预测框架,该框架通过挖掘实体局部规则来推断新边。该方法利用了富集分析——一种用于识别差异表达基因集合共有机制的成熟统计方法。EDGAR的推理结果天然具备可解释性和可排序性,其p值指示了每条基于富集规则的统计显著性。我们通过一个大规模生物医学知识图谱ROBOKOP,并以阿尔茨海默病(AD)的药物重定位作为案例研究,展示了该框架的有效性。首先,我们从知识图谱中提取了14种已知药物,并通过富集分析识别出20个上下文生物标志物,揭示了与AD共享药物疗效相关的功能通路。随后,利用前1000个富集分析结果,我们的系统识别出1246个潜在的AD治疗候选药物。排名前10的候选药物已通过医学文献证据得到验证。EDGAR已部署在ROBOKOP平台中,并配有网络用户界面。本研究首次将富集分析应用于大规模图补全与药物重定位任务。