Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a specific user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work proposes a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.
翻译:企业级检索增强生成(RAG)提供了一个高度灵活的框架,用于将强大的大语言模型与内部(可能随时间变化)的文档相结合。在RAG中,文档首先被分块处理,然后针对特定用户查询检索相关块,并将其作为上下文传递给合成器大语言模型,以生成查询响应。然而,检索步骤可能限制性能,因为错误的块会导致合成器大语言模型生成错误响应。本文提出了一种对标准稠密检索步骤的零样本适配方法,以实现更准确的块召回。具体而言,首先将块分解为原子语句,然后基于这些原子(以块为上下文)生成一组合成问题。稠密检索涉及查找与用户查询最接近的合成问题集及其关联块。研究发现,基于原子进行检索比基于块的检索具有更高的召回率。进一步地,利用基于原子生成的合成问题进行检索时,性能得到进一步提升。检索步骤中更高的召回率能够通过RAG流水线提升企业级大语言模型的整体性能。