Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code will released at https://github.com/zjunlp/xKG.
翻译:复现人工智能研究对于大型语言模型(LLM)智能体而言是一项关键但具有挑战性的任务。现有方法通常难以生成可执行代码,这主要归因于背景知识不足以及检索增强生成(RAG)方法的局限性,后者无法捕捉参考文献中隐藏的潜在技术细节。此外,先前的方法往往忽视了有价值的实现级代码信号,并且缺乏支持多粒度检索与重用的结构化知识表示。为克服这些挑战,我们提出了可执行知识图谱(xKG),这是一个模块化、可插拔的知识库,能够自动整合从科学文献中提取的技术见解、代码片段和领域特定知识。当将xKG集成到采用两种不同LLM的三个智能体框架中时,其在PaperBench基准上显示出显著的性能提升(使用o3-mini时提升10.9%),证明了其作为自动化AI研究复现的通用且可扩展解决方案的有效性。代码将在 https://github.com/zjunlp/xKG 发布。