Retrieval Augmented Generation (RAG) has become one of the most popular paradigms for enabling LLMs to access external data, and also as a mechanism for grounding to mitigate against hallucinations. When implementing RAG you can face several challenges like effective integration of retrieval models, efficient representation learning, data diversity, computational efficiency optimization, evaluation, and quality of text generation. Given all these challenges, every day a new technique to improve RAG appears, making it unfeasible to experiment with all combinations for your problem. In this context, this paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language, focusing on the establishment of a simple pipeline for inference and experiments. We explored a diverse set of methods to answer questions about the first Harry Potter book. To generate the answers we used the OpenAI's gpt-4, gpt-4-1106-preview, gpt-3.5-turbo-1106, and Google's Gemini Pro. Focusing on the quality of the retriever, our approach achieved an improvement of MRR@10 by 35.4% compared to the baseline. When optimizing the input size in the application, we observed that it is possible to further enhance it by 2.4%. Finally, we present the complete architecture of the RAG with our recommendations. As result, we moved from a baseline of 57.88% to a maximum relative score of 98.61%.
翻译:检索增强生成(RAG)已成为使大语言模型能够访问外部数据的主流范式之一,同时也是缓解模型幻觉的 grounding 机制。实施 RAG 时可能面临多重挑战,包括检索模型的有效集成、高效表征学习、数据多样性、计算效率优化、评估以及文本生成质量。面对这些挑战,每天都有改进 RAG 的新技术涌现,使得为特定问题尝试所有组合变得不可行。在此背景下,本文提出了面向巴西葡萄牙语实施、优化和评估 RAG 的实践指南,重点构建了用于推理和实验的简易流程。我们采用多样化方法体系,针对《哈利·波特》第一部内容进行问答任务。生成答案时使用了 OpenAI 的 gpt-4、gpt-4-1106-preview、gpt-3.5-turbo-1106 以及 Google 的 Gemini Pro 模型。在检索器质量优化方面,我们的方法相比基线实现了 MRR@10 指标 35.4% 的提升。当优化应用程序输入规模时,观察到可进一步获得 2.4% 的性能增强。最后,我们给出了包含建议的完整 RAG 架构。实验结果显示,相对评分从基线 57.88% 跃升至最高 98.61%。