Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development. Code and data are available at https://github.com/layer6ai-labs/rag-error-classification.
翻译:检索增强生成(RAG)是一种构建基于大语言模型(LLM)问答系统的常用方法,其能够利用外部知识库。由于现实世界中RAG系统的复杂性,其错误输出可能由多种潜在因素导致。理解实践中可能出现的各类错误对于实现稳健部署至关重要。本文提出了一种针对实际RAG系统可能出现的错误类型的新分类体系,列举了各类错误的实例,并提供了相应的实用应对建议。此外,我们构建了一个标注错误类型的RAG错误响应数据集。基于此分类体系,我们进一步提出一种可用于开发阶段追踪和处理错误的自动化评估方法。相关代码与数据已发布于 https://github.com/layer6ai-labs/rag-error-classification。