FAIR-RAG: Faithful Adaptive Iterative Refinement for Retrieval-Augmented Generation

from arxiv, 30 pages, 5 figures, 5 tables. Keywords: Retrieval-Augmented Generation (RAG), Large Language Models (LLMs), Agentic AI, Multi-hop Question Answering, Faithfulness

While Retrieval-Augmented Generation (RAG) mitigates hallucination and knowledge staleness in Large Language Models (LLMs), existing frameworks often falter on complex, multi-hop queries that require synthesizing information from disparate sources. Current advanced RAG methods, employing iterative or adaptive strategies, lack a robust mechanism to systematically identify and fill evidence gaps, often propagating noise or failing to gather a comprehensive context. We introduce FAIR-RAG, a novel agentic framework that transforms the standard RAG pipeline into a dynamic, evidence-driven reasoning process. At its core is an Iterative Refinement Cycle governed by a module we term Structured Evidence Assessment (SEA). The SEA acts as an analytical gating mechanism: it deconstructs the initial query into a checklist of required findings and audits the aggregated evidence to identify confirmed facts and, critically, explicit informational gaps. These gaps provide a precise signal to an Adaptive Query Refinement agent, which generates new, targeted sub-queries to retrieve missing information. This cycle repeats until the evidence is verified as sufficient, ensuring a comprehensive context for a final, strictly faithful generation. We conducted experiments on challenging multi-hop QA benchmarks, including HotpotQA, 2WikiMultiHopQA, and MusiQue. In a unified experimental setup, FAIR-RAG significantly outperforms strong baselines. On HotpotQA, it achieves an F1-score of 0.453 -- an absolute improvement of 8.3 points over the strongest iterative baseline -- establishing a new state-of-the-art for this class of methods on these benchmarks. Our work demonstrates that a structured, evidence-driven refinement process with explicit gap analysis is crucial for unlocking reliable and accurate reasoning in advanced RAG systems for complex, knowledge-intensive tasks.

翻译：尽管检索增强生成（RAG）技术缓解了大语言模型（LLM）中的幻觉与知识陈旧问题，但现有框架在处理需要从不同来源综合信息的复杂多跳查询时往往表现不佳。当前先进的RAG方法虽采用迭代或自适应策略，却缺乏系统识别与填补证据空白的稳健机制，常导致噪声传播或无法构建全面上下文。本文提出FAIR-RAG——一种新型智能体框架，将标准RAG流程转化为动态的证据驱动推理过程。其核心是由结构化证据评估模块（SEA）主导的迭代优化循环。SEA作为分析门控机制：将初始查询解构为所需发现项的核查清单，并审计聚合证据以识别已确认事实及关键的信息显性空白。这些空白为自适应查询优化智能体提供精确信号，使其生成新的定向子查询以检索缺失信息。该循环持续进行直至证据被验证为充分，从而为最终严格可信的生成确保完整上下文。我们在具有挑战性的多跳问答基准测试（包括HotpotQA、2WikiMultiHopQA和MusiQue）上进行了实验。在统一实验设置下，FAIR-RAG显著优于现有强基线模型。在HotpotQA上，其F1分数达到0.453——相较最强迭代基线实现8.3个百分点的绝对提升，为此类方法在这些基准上确立了新的最优性能。本研究表明，具有显性空白分析的结构化证据驱动优化流程，对于在面向复杂知识密集型任务的高级RAG系统中实现可靠精准推理具有关键意义。