ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

from arxiv, 9 pages main text plus appendices, 8 figures. Dataset and benchmark paper. ChronoMedKG released under CC BY 4.0 and ChronoTQA/code under MIT (Zenodo: 10.5281/zenodo.19697542). Under review

Biomedical knowledge graphs (KGs) treat disease associations as static facts, but temporal information is crucial for clinical reasoning, e.g., a symptom diagnostic of one disease at age 3 may imply a different disease at age 13. Existing KGs such as PrimeKG, Hetionet, and iKraph do not encode when a finding becomes clinically relevant over the course of a disease. This limits their usefulness for longitudinal clinical reasoning and retrieval augmentation. We introduce ChronoMedKG, a temporal biomedical knowledge graph that contains 460,497 evidence-linked triples (filtered from 13M raw extractions) covering 13,431 diseases. Each association is tied to temporal components like onset window or progression stage, which are backed by PMID-traceable evidence and a multi-signal credibility score. The graph is constructed through a disease-autonomous multi-agent pipeline in which multiple frontier LLMs independently extract knowledge from PubMed and PMC literature. Only those relations are kept that are supported by multi-model consensus, survive credibility filtering, as well as ontology alignment. ChronoMedKG scored 92.7% agreement against Orphadata and adds temporal grounding for 6,250 diseases absent from HPOA, Orphadata, and Phenopackets, including 1,657 Orphanet-coded rare diseases. We further introduce ChronoTQA, a benchmark of 3,341 questions across eight task types (six temporal plus two static controls), with a 12-question supplementary probe. Frontier LLMs lose roughly 30 points moving from static to temporal questions; ChronoMedKG retrieval rescues 47-65% of their long-tail failures, against 17-29% for HPOA-RAG. As such, ChronoMedKG provides a crucial temporal axis for retrieval-augmented clinical systems that was previously absent.

翻译：生物医学知识图谱将疾病关联视为静态事实，但时间信息对临床推理至关重要，例如，3岁时某种症状可诊断的疾病，到13岁时可能意味着另一种疾病。现有知识图谱（如PrimeKG、Hetionet和iKraph）未能编码发现结果在疾病进程中何时具有临床相关性，这限制了它们在纵向临床推理和检索增强中的效用。我们提出ChronoMedKG——一个包含460,497个具有证据链接的三元组（从1300万个原始提取结果中筛选）的时间约束生物医学知识图谱，涵盖13,431种疾病。每个关联都绑定时间组件（如发病窗口或进展阶段），这些组件具有PubMed ID可溯源证据和多方信号可信度评分。该图谱通过疾病自主多智能体流程构建，多个前沿大语言模型独立地从PubMed和PMC文献中提取知识，仅保留那些获得多模型共识、通过可信度筛选及本体对齐的关系。ChronoMedKG与Orphadata的一致性达到92.7%，并为HPOA、Orphadata和Phenopackets中缺失的6,250种疾病（包括1,657种Orphanet编码的罕见病）添加了时间约束。我们进一步提出ChronoTQA基准，包含3,341个问题，涵盖八种任务类型（六种时间任务加两种静态对照），并设有12个补充测试题。前沿大语言模型在处理时间问题时性能下降约30分；ChronoMedKG检索可挽救其47-65%的长尾失败，而HPOA-RAG仅挽救17-29%。因此，ChronoMedKG为先前缺失时间维度的检索增强临床系统提供了关键的时间轴。