We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines. RAG systems are composed of a retrieval and an LLM based generation module, and provide LLMs with knowledge from a reference textual database, which enables them to act as a natural language layer between a user and textual databases, reducing the risk of hallucinations. Evaluating RAG architectures is, however, challenging because there are several dimensions to consider: the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself. With RAGAs, we put forward a suite of metrics which can be used to evaluate these different dimensions \textit{without having to rely on ground truth human annotations}. We posit that such a framework can crucially contribute to faster evaluation cycles of RAG architectures, which is especially important given the fast adoption of LLMs.
翻译:我们提出了RAGAS(检索增强生成评估框架),一种无需参考标注的检索增强生成(RAG)流水线评估框架。RAG系统由检索模块与基于大语言模型的生成模块组成,能够从参考文本数据库中为LLM提供知识,使其充当用户与文本数据库之间的自然语言接口,从而降低幻觉风险。然而,评估RAG架构具有挑战性,因为需要考虑多个维度:检索系统识别相关且聚焦的上下文片段的能力、LLM忠实利用这些片段的能力,以及生成内容的质量本身。通过RAGAS,我们提出一套指标,用于评估这些不同维度,且无需依赖人工标注的真实参考。我们认为,此类框架能显著加速RAG架构的评估周期——在LLM快速普及的背景下,这一点尤为重要。