Corrective Retrieval Augmented Generation (CRAG) improves the robustness of RAG systems by evaluating retrieved document quality and triggering corrective actions. However, the original implementation relies on proprietary components including the Google Search API and closed model weights, limiting reproducibility. In this work, we present a fully open-source reproduction of CRAG, replacing proprietary web search with the Wikipedia API and the original LLaMA-2 generator with Phi-3-mini-4k-instruct. We evaluate on PopQA and ARC-Challenge, demonstrating that our open-source pipeline achieves comparable performance to the original system. Furthermore, we contribute the first explainability analysis of CRAG's T5-based retrieval evaluator using SHAP, revealing that the evaluator primarily relies on named entity alignment rather than semantic similarity. Our analysis identifies key failure modes including domain transfer limitations on science questions. All code and results are available at https://github.com/suryayalavarthi/crag-reproduction.
翻译:纠错检索增强生成(CRAG)通过评估检索文档质量并触发纠正操作,提升了RAG系统的鲁棒性。然而,原实现依赖包括谷歌搜索API和闭源模型权重在内的专有组件,限制了可复现性。本研究提出了CRAG的完全开源复现方案,使用维基百科API替代专有网络搜索,并以Phi-3-mini-4k-instruct替换原始LLaMA-2生成器。我们在PopQA和ARC-Challenge数据集上的评估表明,开源流程达到了与原系统相当的性能。此外,我们首次利用SHAP方法对CRAG基于T5的检索评估器进行可解释性分析,揭示该评估器主要依赖命名实体对齐而非语义相似性。我们的分析识别了关键失效模式,包括在科学问题上的领域迁移限制。所有代码与结果已发布于https://github.com/suryayalavarthi/crag-reproduction。