The advent of Retrieval-Augmented Generation (RAG) has significantly enhanced the ability of Large Language Models (LLMs) to produce factually accurate and up-to-date responses. However, the performance of a RAG system is not determined by a single component but emerges from a complex interplay of modular choices, such as embedding models and retrieval algorithms. This creates a vast and often opaque configuration space, making it challenging for developers to understand performance trade-offs and identify optimal designs. To address this challenge, we present RAGExplorer, a visual analytics system for the systematic comparison and diagnosis of RAG configurations. RAGExplorer guides users through a seamless macro-to-micro analytical workflow. Initially, it empowers developers to survey the performance landscape across numerous configurations, allowing for a high-level understanding of which design choices are most effective. For a deeper analysis, the system enables users to drill down into individual failure cases, investigate how differences in retrieved information contribute to errors, and interactively test hypotheses by manipulating the provided context to observe the resulting impact on the generated answer. We demonstrate the effectiveness of RAGExplorer through detailed case studies and user studies, validating its ability to empower developers in navigating the complex RAG design space. Our code and user guide are publicly available at https://github.com/Thymezzz/RAGExplorer.
翻译:检索增强生成(RAG)技术的出现显著提升了大型语言模型(LLM)生成事实准确且信息及时的回答能力。然而,RAG系统的性能并非由单一组件决定,而是源于嵌入模型与检索算法等模块化选择之间复杂的相互作用。这形成了一个庞大且通常不透明的配置空间,使开发者难以理解性能权衡并确定最优设计方案。为应对这一挑战,我们提出了RAGExplorer——一个用于系统化比较与诊断RAG配置的可视化分析系统。RAGExplorer引导用户完成从宏观到微观的无缝分析工作流:首先帮助开发者纵览海量配置的性能全景,从高层级理解哪些设计选择最为有效;进而支持用户深入探究具体失败案例,分析检索信息差异如何导致错误生成,并通过交互式操纵输入上下文以观察对生成答案的影响来进行假设验证。我们通过详细案例研究与用户实验证明了RAGExplorer的有效性,验证了其赋能开发者驾驭复杂RAG设计空间的能力。系统代码与用户指南已公开于https://github.com/Thymezzz/RAGExplorer。