Retrieval-Augmented Generation (RAG) in open-domain settings faces significant challenges regarding irrelevant information in retrieved documents and the alignment of generated answers with user intent. We present HiFi-RAG (Hierarchical Filtering RAG), the winning closed-source system in the Text-to-Text static evaluation of the MMU-RAGent NeurIPS 2025 Competition. Our approach moves beyond standard embedding-based retrieval via a multi-stage pipeline. We leverage the speed and cost-efficiency of Gemini 2.5 Flash (4-6x cheaper than Pro) for query formulation, hierarchical content filtering, and citation attribution, while reserving the reasoning capabilities of Gemini 2.5 Pro for final answer generation. On the MMU-RAGent validation set, our system outperformed the baseline, improving ROUGE-L to 0.274 (+19.6%) and DeBERTaScore to 0.677 (+6.2%). On Test2025, our custom dataset evaluating questions that require post-cutoff knowledge (post January 2025), HiFi-RAG outperforms the parametric baseline by 57.4% in ROUGE-L and 14.9% in DeBERTaScore.
翻译:在开放域设置中,检索增强生成(RAG)面临着检索文档中包含无关信息以及生成答案与用户意图对齐方面的重大挑战。我们提出了HiFi-RAG(层次化过滤RAG),该系统是MMU-RAGent NeurIPS 2025竞赛中文本到文本静态评估的获胜闭源系统。我们的方法通过一个多阶段流程,超越了标准的基于嵌入的检索。我们利用Gemini 2.5 Flash(成本比Pro版本低4-6倍)的速度和成本效益进行查询构建、层次化内容过滤和引用归属,同时保留Gemini 2.5 Pro的推理能力用于最终答案生成。在MMU-RAGent验证集上,我们的系统优于基线,将ROUGE-L提升至0.274(+19.6%),DeBERTaScore提升至0.677(+6.2%)。在Test2025(我们自定义的用于评估需要截止日期后知识(2025年1月之后)问题的数据集)上,HiFi-RAG在ROUGE-L和DeBERTaScore上分别比参数化基线高出57.4%和14.9%。