When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering - 专知论文

会员服务 ·

0

多跳 · MoDELS · 控制器 · 自动问答 · 知识 (knowledge) ·

When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering

翻译：暂无翻译

Mahdi Astaraki,Mohammad Arshi Saloot,Ali Shiraee Kasmaee,Hamidreza Mahyar,Soheila Samiee

from arxiv, 51 pages, 29 figures

Retrieval-Augmented Generation (RAG) extends large language models (LLMs) beyond parametric knowledge, yet it is unclear when iterative retrieval-reasoning loops meaningfully outperform static RAG, particularly in scientific domains with multi-hop reasoning, sparse domain knowledge, and heterogeneous evidence. We provide the first controlled, mechanism-level diagnostic study of whether synchronized iterative retrieval and reasoning can surpass an idealized static upper bound (Gold Context) RAG. We benchmark eleven state-of-the-art LLMs under three regimes: (i) No Context, measuring reliance on parametric memory; (ii) Gold Context, where all oracle evidence is supplied at once; and (iii) Iterative RAG, a training-free controller that alternates retrieval, hypothesis refinement, and evidence-aware stopping. Using the chemistry-focused ChemKGMultiHopQA dataset, we isolate questions requiring genuine retrieval and analyze behavior with diagnostics spanning retrieval coverage gaps, anchor-carry drop, query quality, composition fidelity, and control calibration. Across models, Iterative RAG consistently outperforms Gold Context, with gains up to 25.6 percentage points, especially for non-reasoning fine-tuned models. Staged retrieval reduces late-hop failures, mitigates context overload, and enables dynamic correction of early hypothesis drift, but remaining failure modes include incomplete hop coverage, distractor latch trajectories, early stopping miscalibration, and high composition failure rates even with perfect retrieval. Overall, staged retrieval is often more influential than the mere presence of ideal evidence; we provide practical guidance for deploying and diagnosing RAG systems in specialized scientific settings and a foundation for more reliable, controllable iterative retrieval-reasoning frameworks.

翻译：暂无翻译

0

相关内容

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

专知会员服务

28+阅读 · 1月1日

Deep Research（深度研究）：系统性综述

Deep Research（深度研究）：系统性综述

专知会员服务

51+阅读 · 2025年12月3日

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

科学大语言模型综述：从数据基础到智能体前沿

科学大语言模型综述：从数据基础到智能体前沿

专知会员服务

51+阅读 · 2025年9月1日

【新书】Essential GraphRAG: 知识图谱增强的RAG

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

35+阅读 · 2025年7月17日

检索增强生成(RAG)与推理的协同作用：一项系统综述

检索增强生成(RAG)与推理的协同作用：一项系统综述

专知会员服务

16+阅读 · 2025年4月27日

重新思考不确定性：大语言模型时代的关键综述与分析

重新思考不确定性：大语言模型时代的关键综述与分析

专知会员服务

39+阅读 · 2024年11月20日

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

专知会员服务

111+阅读 · 2023年12月19日

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

专知会员服务

67+阅读 · 2022年9月17日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

学术头条

18+阅读 · 2019年12月8日

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

专知

96+阅读 · 2019年9月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

论文浅尝 | 基于知识图谱子图匹配以回答自然语言问题

论文浅尝 | 基于知识图谱子图匹配以回答自然语言问题

开放知识图谱

26+阅读 · 2018年6月26日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于格值逻辑的语言真值α-群锁语义归结自动推理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于复杂数据的回归模型统计推断及其应用

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

基于相依数据的梯度学习理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

维吾尔语韵律结构的分析与预测模型的研究

国家自然科学基金

0+阅读 · 2014年12月31日

一个组合猜想及其相关问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于脑电信号的藏语拉萨话韵律认知理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

种群遗传学的多人交互式学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

Diffusion Language Models: An Experimental Analysis

Arxiv

0+阅读 · 6月17日

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Arxiv

0+阅读 · 6月17日

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

Arxiv

0+阅读 · 6月16日

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Arxiv

0+阅读 · 6月16日

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

Arxiv

0+阅读 · 6月4日

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Arxiv

0+阅读 · 6月2日

An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG

Arxiv

0+阅读 · 5月25日

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

Arxiv

0+阅读 · 5月14日

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Arxiv

0+阅读 · 5月12日

ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation

Arxiv

0+阅读 · 5月11日

VIP会员

文章信息

相关主题

知识 (knowledge)

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

3+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

专知会员服务

28+阅读 · 1月1日

Deep Research（深度研究）：系统性综述

Deep Research（深度研究）：系统性综述

专知会员服务

51+阅读 · 2025年12月3日

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

科学大语言模型综述：从数据基础到智能体前沿

科学大语言模型综述：从数据基础到智能体前沿

专知会员服务

51+阅读 · 2025年9月1日

【新书】Essential GraphRAG: 知识图谱增强的RAG

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

35+阅读 · 2025年7月17日

检索增强生成(RAG)与推理的协同作用：一项系统综述

检索增强生成(RAG)与推理的协同作用：一项系统综述

专知会员服务

16+阅读 · 2025年4月27日

重新思考不确定性：大语言模型时代的关键综述与分析

重新思考不确定性：大语言模型时代的关键综述与分析

专知会员服务

39+阅读 · 2024年11月20日

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

专知会员服务

111+阅读 · 2023年12月19日

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

专知会员服务

67+阅读 · 2022年9月17日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

KG 高引论文解读两篇 | 两种模型：多层卷积神经网络、知识感知路径递归网络

学术头条

18+阅读 · 2019年12月8日

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

GAN新书《生成式深度学习》Generative Deep Learning，附379页全文PDF

专知

96+阅读 · 2019年9月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

论文浅尝 | 基于知识图谱子图匹配以回答自然语言问题

论文浅尝 | 基于知识图谱子图匹配以回答自然语言问题

开放知识图谱

26+阅读 · 2018年6月26日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Diffusion Language Models: An Experimental Analysis

Arxiv

0+阅读 · 6月17日

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Arxiv

0+阅读 · 6月17日

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

Arxiv

0+阅读 · 6月16日

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Arxiv

0+阅读 · 6月16日

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

Arxiv

0+阅读 · 6月4日

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Arxiv

0+阅读 · 6月2日

An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG

Arxiv

0+阅读 · 5月25日

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

Arxiv

0+阅读 · 5月14日

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Arxiv

0+阅读 · 5月12日

ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation

Arxiv

0+阅读 · 5月11日

相关基金

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于格值逻辑的语言真值α-群锁语义归结自动推理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于复杂数据的回归模型统计推断及其应用

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

基于相依数据的梯度学习理论研究

国家自然科学基金

1+阅读 · 2015年12月31日

维吾尔语韵律结构的分析与预测模型的研究

国家自然科学基金

0+阅读 · 2014年12月31日

一个组合猜想及其相关问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于脑电信号的藏语拉萨话韵律认知理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

种群遗传学的多人交互式学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员