Historical medical archives and traditional medicines hold immense potential for drug discovery and remain a primary source for current drug development. However, pre-ontological prose and idiosyncratic taxonomies prevent the standardization and medical modernization of the data for use in current biomedical pipelines. Furthermore, no existing LLM agent system, whether tool-calling, retrieval-augmented, or agentic deep-research, can convert such text into verifiable drug-discovery leads at scale. We close this gap with DeepRoot, a multi-agent LLM system that jointly builds and utilizes a verified knowledge graph, showing that grounding and reasoning -- often conflated -- are separable axes the system can compose for therapeutic reasoning. Applied to the Shen Nong Ben Cao Jing, DeepRoot recovers $10$ of $21$ held-out compound-disease treatment pairs at R@$20$ ($47.6\%$ vs $4.8\%$ for a raw corpus LLM and $\sim\!2.4\%$ random) and dominates an LLM-as-judge audit for reasoning quality over baseline LLMs and LLMs with direct tool-call access to the same APIs DeepRoot itself queries. Tool-using LLMs hallucinate evidence on $87\%$ of claims, versus 7-10% for DeepRoot. Graph-only inference hallucinates $0\%$ but ranks lowest on reasoning coherence; DeepRoot KG+LLM is the only condition to win on both axes, pointing toward a route for systematic mining and repurposing of historical medical knowledge.
翻译:历史医学档案与传统医药蕴含着巨大的药物发现潜力,且仍是当前药物开发的主要来源。然而,前本体论风格的散文与特有的分类体系阻碍了这些数据的标准化与医学现代化进程,使其无法用于当前生物医学流程。此外,现有的大语言模型智能体系统,无论是工具调用型、检索增强型还是智能深度研究型,均无法将此类文本大规模转化为可验证的药物发现线索。我们通过DeepRoot填补了这一空白——这是一个多智能体大语言模型系统,能够联合构建并利用经过验证的知识图谱。研究表明,常被混为一谈的“基座化”与“推理”实为可分离的维度,系统可对二者进行组合以实现治疗推理。在应用于《神农本草经》时,DeepRoot在R@20指标下成功恢复了21个保留药物-疾病治疗对中的10个(召回率47.6%,原始语料大语言模型为4.8%,随机基线约为2.4%)。在推理质量审计中,DeepRoot以LLM-as-judge评估方式显著优于基线大语言模型及直接调用相同接口的工具型大语言模型。工具型大语言模型在87%的论证中产生证据幻觉,而DeepRoot仅为7-10%。纯知识图谱推理的幻觉率为0%,但推理连贯性得分最低;唯有DeepRoot的知识图谱与大语言模型联合方案在两项指标上均占优,这为系统化挖掘与再利用历史医学知识指明了路径。