Muon collider research spans accelerator physics, detector instrumentation, and high-energy phenomenology, with relevant evidence scattered across a rapidly expanding and heterogeneous body of scientific literature. As high-energy physics (HEP) increasingly explores agent-assisted analysis workflows, efficiently locating, integrating, and verifying scientific evidence becomes an essential capability. While retrieval-augmented generation (RAG) offers a promising framework for scientific question answering, integrating agentic reasoning without compromising retrieval precision remains a key challenge. In this work, we present agentic hybrid RAG, an evidence-grounded RAG framework for muon collider research. The framework combines a hybrid retriever, integrating sparse lexical and dense semantic retrieval, with an agentic reasoning module for query decomposition, evidence expansion, and grounded answer generation. To enable systematic evaluation, we construct the first benchmark for retrieval-augmented scientific question answering in the muon collider domain, comprising a curated literature corpus together with dedicated retrieval and answer-generation benchmarks covering major detector and physics research topics. Extensive evaluation shows that hybrid retrieval provides the strongest retrieval backbone, while agentic reasoning is most effective for controlled evidence expansion and answer synthesis. Built on this principle, agentic hybrid RAG consistently outperforms representative retrieval and RAG baselines in retrieval effectiveness, answer quality, evidence coverage, and factual grounding. Together, the benchmark and framework provide a foundation for evidence-grounded scientific question answering and future HEP analysis agents operating over large-scale scientific literature.
翻译:缪子对撞机研究涵盖加速器物理、探测器仪器及高能现象学,相关证据散布在快速扩展且异质化的科学文献中。随着高能物理日益探索智能体辅助分析工作流,高效定位、整合与验证科学证据成为关键能力。尽管检索增强生成(RAG)为科学问答提供了有前景的框架,但在不牺牲检索精度前提下整合智能体推理仍是一大挑战。本文提出面向缪子对撞机研究的基于证据的Agent混合RAG框架,该框架融合混合检索器(整合稀疏词法检索与密集语义检索)与智能体推理模块,实现查询分解、证据扩展及基于证据的答案生成。为系统评估,我们构建了缪子对撞机领域首个检索增强科学问答基准,包含精选文献语料库及覆盖主要探测器与物理研究主题的专用检索与答案生成基准。大量实验表明,混合检索提供了最强的检索主干,而智能体推理在受控证据扩展与答案合成方面最为有效。基于该原则,Agent混合RAG在检索有效性、答案质量、证据覆盖率及事实依据性上持续优于代表性检索与RAG基线。本基准与框架共同为基于证据的科学问答及未来在大规模科学文献上运行的高能物理分析智能体奠定了基础。