Biomedical knowledge graphs (KGs) treat disease associations as static facts, but temporal information is crucial for clinical reasoning, e.g., a symptom diagnostic of one disease at age 3 may imply a different disease at age 13. Existing KGs such as PrimeKG, Hetionet, and iKraph do not encode when a finding becomes clinically relevant over the course of a disease. This limits their usefulness for longitudinal clinical reasoning and retrieval augmentation. We introduce ChronoMedKG, a temporal biomedical knowledge graph that contains 460,497 evidence-linked triples (filtered from 13M raw extractions) covering 13,431 diseases. Each association is tied to temporal components like onset window or progression stage, which are backed by PMID-traceable evidence and a multi-signal credibility score. The graph is constructed through a disease-autonomous multi-agent pipeline in which multiple frontier LLMs independently extract knowledge from PubMed and PMC literature. Only those relations are kept that are supported by multi-model consensus, survive credibility filtering, as well as ontology alignment. ChronoMedKG scored 92.7% agreement against Orphadata and adds temporal grounding for 6,250 diseases absent from HPOA, Orphadata, and Phenopackets, including 1,657 Orphanet-coded rare diseases. We further introduce ChronoTQA, a benchmark of 3,341 questions across eight task types (six temporal plus two static controls), with a 12-question supplementary probe. Frontier LLMs lose roughly 30 points moving from static to temporal questions; ChronoMedKG retrieval rescues 47-65% of their long-tail failures, against 17-29% for HPOA-RAG. As such, ChronoMedKG provides a crucial temporal axis for retrieval-augmented clinical systems that was previously absent.
翻译:生物医学知识图谱将疾病关联视为静态事实,但时间信息对临床推理至关重要,例如,3岁时某种症状可诊断的疾病,到13岁时可能意味着另一种疾病。现有知识图谱(如PrimeKG、Hetionet和iKraph)未能编码发现结果在疾病进程中何时具有临床相关性,这限制了它们在纵向临床推理和检索增强中的效用。我们提出ChronoMedKG——一个包含460,497个具有证据链接的三元组(从1300万个原始提取结果中筛选)的时间约束生物医学知识图谱,涵盖13,431种疾病。每个关联都绑定时间组件(如发病窗口或进展阶段),这些组件具有PubMed ID可溯源证据和多方信号可信度评分。该图谱通过疾病自主多智能体流程构建,多个前沿大语言模型独立地从PubMed和PMC文献中提取知识,仅保留那些获得多模型共识、通过可信度筛选及本体对齐的关系。ChronoMedKG与Orphadata的一致性达到92.7%,并为HPOA、Orphadata和Phenopackets中缺失的6,250种疾病(包括1,657种Orphanet编码的罕见病)添加了时间约束。我们进一步提出ChronoTQA基准,包含3,341个问题,涵盖八种任务类型(六种时间任务加两种静态对照),并设有12个补充测试题。前沿大语言模型在处理时间问题时性能下降约30分;ChronoMedKG检索可挽救其47-65%的长尾失败,而HPOA-RAG仅挽救17-29%。因此,ChronoMedKG为先前缺失时间维度的检索增强临床系统提供了关键的时间轴。