Managing the rapidly growing scholarly corpus poses significant challenges in representation, reasoning, and efficient analysis. An ideal system should unify structured knowledge management, agentic planning, and interpretable execution to support diverse scholarly queries - from retrieval to knowledge discovery and generation - at scale. Unfortunately, existing RAG and document analytics systems fail to achieve all query types simultaneously. To this end, we propose AgenticScholar, an agentic scholarly data management system that integrates a structure-aware knowledge representation layer, an LLM-centric hybrid query planning layer, and a unified execution layer with composable operators. AgenticScholar autonomously translates natural language queries into executable DAG plans, enabling end-to-end reasoning over multi-modal scholarly data. Extensive experiments demonstrate that AgenticScholar significantly outperforms existing systems in effectiveness, efficiency, and interpretability, offering a practical foundation for future research on agentic scholarly data management.
翻译:管理快速增长的学术语料库在表征、推理和高效分析方面提出了重大挑战。理想的系统应统一结构化知识管理、智能规划与可解释执行,以支持从检索到知识发现与生成等多样化的大规模学术查询。遗憾的是,现有的RAG和文档分析系统均无法同时实现所有查询类型。为此,我们提出了AgenticScholar,一个集成了结构感知知识表征层、以LLM为中心的混合查询规划层,以及具有可组合算子的统一执行层的智能学术数据管理系统。AgenticScholar能够将自然语言查询自主转换为可执行的DAG计划,实现对多模态学术数据的端到端推理。大量实验表明,AgenticScholar在有效性、效率和可解释性方面显著优于现有系统,为未来智能学术数据管理研究提供了实用基础。