Retrieval-augmented generation (RAG) combines document retrieval with large language models to produce responses grounded in external evidence. While several R packages support core components of RAG workflows, integrated evaluation of RAG systems in R remains limited and is often conducted through Python-based tools, most notably the RAG assessment (RAGAS) framework. To address this gap, we introduce ragR, an R package that unifies document ingestion, embedding and vector storage, similarity-based retrieval, grounded generation, structured question-answer logging, and RAGAS-style evaluation within a single R-native workflow. The current implementation provides LLM-based scoring for four core RAGAS metrics: context precision, context recall, faithfulness, and answer relevance. Validation experiments under controlled settings show that ragR captures similar metric behavior to the reference Python RAGAS workflow across multiple use cases. By integrating RAG construction and evaluation within a reproducible workflow in R, ragR provides a practical framework for research, teaching, and moderate-scale experimentation on RAG systems entirely within the R ecosystem.
翻译:检索增强生成(RAG)将文档检索与大型语言模型相结合,以生成基于外部证据的响应。尽管多个R包支持RAG工作流的核心组件,但R语言中RAG系统的集成评估仍然有限,且通常通过基于Python的工具(尤其是RAG评估框架RAGAS)执行。为弥补这一空白,我们提出了ragR——一个R包,它在单一的R原生工作流中统一了文档导入、嵌入与向量存储、基于相似度的检索、基于证据的生成、结构化问答日志记录以及RAGAS风格评估。当前实现为四个核心RAGAS指标提供了基于大语言模型的评分:上下文精确度、上下文召回率、忠实性和答案相关性。在受控条件下的验证实验表明,ragR能在多个使用场景中捕获与参考Python RAGAS工作流相似的指标行为。通过将RAG构建与评估整合到R中可复现的工作流内,ragR为完全在R生态系统内进行RAG系统的研究、教学和中规模实验提供了实用框架。