Current RAG retrievers are designed primarily for human readers, emphasizing complete, readable, and coherent paragraphs. However, Large Language Models (LLMs) benefit more from precise, compact, and well-structured input, which enhances reasoning quality and efficiency. Existing methods rely on reranking or summarization to identify key sentences, but may introduce semantic breaks and unfaithfulness. Thus, efficiently extracting and organizing answer-relevant clues from large-scale documents while reducing LLM reasoning costs remains challenging in RAG systems. Inspired by Occam's razor, we frame LLM-centric retrieval as MinMax optimization: maximizing the extraction of potential clues and reranking them for well-organization, while minimizing reasoning costs by truncating to the smallest sufficient set of clues. In this paper, we propose CompSelect, a compact clue selection mechanism for LLM-centric RAG, consisting of a clue extractor, a reranker, and a truncator. (1) The clue extractor first uses answer-containing sentences as fine-tuning targets, aiming to extract sufficient potential clues; (2) The reranker is trained to prioritize effective clues based on real LLM feedback; (3) The truncator uses the truncated text containing the minimum sufficient clues for answering the question as fine-tuning targets, thereby enabling efficient RAG reasoning. Experiments on three QA datasets demonstrate that CompSelect improves performance while reducing both total and online latency compared to a range of baseline methods. Further analysis also confirms its robustness to unreliable retrieval and generalization across different scenarios.
翻译:当前检索增强生成(RAG)系统中的检索器主要面向人类读者设计,强调检索结果的完整性、可读性与段落连贯性。然而,大型语言模型(LLMs)更受益于精确、紧凑且结构良好的输入,此类输入能有效提升其推理质量与效率。现有方法多依赖重排序或摘要技术来识别关键语句,但可能引入语义断裂与信息失真问题。因此,如何从大规模文档中高效提取并组织与答案相关的线索,同时降低LLM的推理成本,仍是RAG系统面临的挑战。受奥卡姆剃刀原理启发,我们将以LLM为中心的检索任务构建为MinMax优化问题:在最大化提取潜在线索并重排序以优化组织的同时,通过截取最小充分线索集合来最小化推理成本。本文提出CompSelect——一种面向LLM中心化RAG的紧凑线索选择机制,包含线索提取器、重排序器与截断器三个组件。(1)线索提取器首先以包含答案的句子作为微调目标,旨在提取充分的潜在线索;(2)重排序器基于真实LLM反馈进行训练,以优先排列有效线索;(3)截断器以包含回答问题所需最小充分线索的截断文本作为微调目标,从而实现高效的RAG推理。在三个问答数据集上的实验表明,相较于多种基线方法,CompSelect在提升性能的同时显著降低了总延迟与在线延迟。进一步分析也验证了其对不可靠检索的鲁棒性及在不同场景下的泛化能力。