Retrieval-Augmented Generation (RAG) enhances the accuracy of Large Language Model (LLM) responses by leveraging relevant external documents during generation. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various language models on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for most LLMs, reducing performance by up to 20%. However, Qwen2.5 maintained consistent results across increasing document counts, indicating better multi-document handling capability. Finally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We also make the datasets and code available: https://github.com/shaharl6000/MoreDocsSameLen .
翻译:检索增强生成(RAG)通过在生成过程中利用相关外部文档来提高大型语言模型(LLM)响应的准确性。尽管先前的研究指出检索大量文档可能会降低性能,但这些研究并未在控制上下文长度的条件下单独考察文档数量如何影响性能。我们在基于多跳问答任务构建的自定义数据集上评估了多种语言模型。在保持上下文长度和相关信息位置不变的同时,我们改变文档数量,发现增加RAG设置中的文档数量对大多数LLM构成了显著挑战,导致性能下降高达20%。然而,Qwen2.5在文档数量增加时保持了稳定的结果,显示出更优的多文档处理能力。最后,我们的结果表明,处理多个文档是独立于处理长上下文的一项挑战。我们同时公开了数据集和代码:https://github.com/shaharl6000/MoreDocsSameLen。