Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for scientific reasoning. To this end, we introduce \textbf{Agents-K1}, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs. Agents-K1 integrates three components under a unifying theoretical foundation: a multimodal parser whose five-module schema captures entities, multimodal evidence, citations, and typed inter-entity relations across the full paper rather than abstracts alone; a 4B information-extraction backbone trained with GRPO under a rule-based reward; and a graphanything CLI, a tri-source agent interface that unifies web search, multimodal graph retrieval, and cross-document traversal. On top of this, we process 2.46 million scientific papers across six subjects to produce \textbf{Scholar-KG}, of which we release a one-million-paper subset, and the full Scholar-KG is accessible via the SCP link below. The same pipeline can be extended to general-domain corpora and to schema-conformant data synthesis. Extensive experiments demonstrate that Agents-K1 achieves superior performance in scientific information extraction, knowledge graph construction, and multi-hop scientific reasoning.
翻译:当前基于大语言模型的研究智能体通过智能体编排取得了进展,但很大程度上忽略了科学知识编排。现有工作常将论文简化为摘要、表层提及和扁平化的\ \texttt{cites}边,忽略了科学推理所必需的关键实体、主张、证据、机制和方法谱系。为此,我们提出\textbf{Agents-K1},一个端到端的知识编排管道,可将原始文档转化为智能体原生的科学知识图谱。Agents-K1在统一的理论基础下整合了三个组件:一个多模态解析器,其五模块模式可捕获跨全文(而非仅摘要)的实体、多模态证据、引用及类型化实体间关系;一个基于GRPO规则奖励训练的4B信息抽取骨干网络;以及一个图交互式CLI——一种统一了网络搜索、多模态图谱检索和跨文档遍历的三源智能体接口。在此基础上,我们处理了涵盖六大学科的246万篇科学论文,生成了\textbf{Scholar-KG},并开放其包含100万篇论文的子集,完整Scholar-KG可通过下方SCP链接获取。同一管道可扩展至通用领域语料库及符合模式的数据合成。大量实验表明,Agents-K1在科学信息抽取、知识图谱构建及多跳科学推理方面均实现了优越性能。