Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.
翻译:企业检索增强生成(RAG)为将强大的大语言模型(LLM)与内部可能随时间变化的文档相结合提供了一个高度灵活的框架。在RAG中,文档首先被分块处理。随后针对用户查询检索相关块,并将其作为上下文传递给合成器LLM以生成查询响应。然而,检索步骤可能限制性能,因为错误的块会导致合成器LLM生成错误响应。本研究采用零样本适配的标准稠密检索步骤以实现更准确的块召回。具体而言,首先将块分解为原子化陈述,随后基于这些原子单元(以原块为上下文)生成一组合成问题。稠密检索涉及寻找与用户查询最接近的合成问题集及其关联块。研究发现,基于原子单元的检索比基于块的检索获得更高召回率。进一步实验表明,在原子单元上生成的合成问题用于检索时可获得更佳性能。检索步骤召回率的提升使得采用RAG流程的企业级LLM能够实现更优的整体性能。