Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a framework and dataset for building disease-specific knowledge graphs from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to extract experimentally grounded findings as structured evidence nodes, normalize biomedical entities, score evidence quality, and connect evidence records through typed semantic relations. We release two resources: EvidenceNet-HCC with 7,872 evidence records, 10,328 graph nodes, and 49,756 edges, and EvidenceNet-CRC with 6,622 records, 8,795 nodes, and 39,361 edges. Technical validation shows high component fidelity, including 98.3% field-level extraction accuracy, 100.0% high-confidence entity-link accuracy, 87.5% fusion integrity, and 90.0% semantic relation-type accuracy. In downstream evaluation, EvidenceNet improves internal and external retrieval-augmented question answering and retains structural signal for future link prediction and target prioritization. These results establish EvidenceNet as a disease-specific resource for evidence-aware biomedical reasoning and hypothesis generation.
翻译:生物医学知识资源通常要么将证据保留为非结构化文本,要么将其压缩为忽略研究设计、来源和定量支持的扁平三元组。本文提出EvidenceNet——一个面向全文生物医学文献构建疾病特异性知识图谱的框架和数据集。EvidenceNet采用大语言模型(LLM)辅助的流水线,将基于实验的发现提取为结构化证据节点,对生物医学实体进行标准化处理,评估证据质量,并通过带类型的语义关系连接证据记录。我们发布两个资源:包含7,872条证据记录、10,328个图节点和49,756条边的EvidenceNet-HCC,以及包含6,622条证据记录、8,795个图节点和39,361条边的EvidenceNet-CRC。技术验证表明组件保真度高,包括98.3%的字段级提取准确率、100.0%的高置信度实体链接准确率、87.5%的融合完整性,以及90.0%的语义关系类型准确率。在下游评估中,EvidenceNet提升了内部和外部检索增强问答的效果,并为未来的链接预测和靶点优先级排序保留了结构信号。这些结果确立了EvidenceNet作为面向疾病特异性资源,用于实现证据感知的生物医学推理和假设生成。