Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-Language Model (LLM) outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecdotal evidence at the moment. In this paper we present a rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies. We use a dataset created this way for the development and evaluation of a boolean agent RAG setup: A system in which a LLM can decide whether to query a vector database or not, thus saving tokens on questions that can be answered with internal knowledge. We publish our code and generated dataset online.
翻译:检索增强生成(RAG)系统在利用领域特定和时效性数据扩充大语言模型(LLM)输出方面获得了广泛应用。近期,RAG系统正从简单的每次用户输入查询向量数据库获取额外信息的设置,向更复杂的形态转变。然而,目前不同具体方法之间的竞争主要依赖于轶事证据。本文提出了一套严谨的数据集创建与评估工作流程,用于定量比较不同的RAG策略。我们利用以此方式创建的数据集,开发并评估了一种布尔代理RAG设置:在该系统中,LLM可自主决定是否查询向量数据库,从而在可通过内部知识回答的问题上节省令牌消耗。相关代码与生成的数据集已公开发布在网络上。