Fixed-length chunking in Retrieval-Augmented Generation (RAG) often leads to boundary fragmentation, where critical evidence is split across segments, degrading retrieval recall. While static windowing and parent retrieval improve recall, they introduce significant token overhead. We propose SCAR (Semantic Continuity-Aware Retrieval), an adaptive retrieval policy that selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty. SCAR uses a relative expansion threshold tied to each retrieved chunk's own query-relevance, yielding an approximately scale-invariant decision rule that transfers across embedding models without recalibration. Across four diverse corpora (RFC, GDPR, a 10-K report, and a Merger agreement; N=320 queries; 160 boundary-fragmented), SCAR achieves 92.8% recall on boundary-fragmented queries with only 7.84 chunks, a 22.9% reduction compared to static windowing (10.16 chunks). Paired bootstrap tests (B=10,000) confirm the chunk reduction is highly significant (p<0.0001, Cohen's d=-1.49, large effect), with a small recall difference (Cohen's d=-0.33). The policy transfers across three embedding models (text-embedding-3-large, BGE-large-en-v1.5, zembed-1) using the same single hyperparameter setting, and downstream RAGAS evaluation on the 10-K corpus confirms SCAR preserves generation faithfulness while reducing context tokens by 27.1%.
翻译:检索增强生成(RAG)中的固定长度分块常导致边界碎片化,使得关键证据被分割到不同片段中,从而降低检索召回率。虽然静态窗口化和父文档检索能提升召回率,但会引入显著的令牌开销。我们提出SCAR(语义连续性感知检索),这是一种自适应检索策略,通过权衡查询-邻近片段相关性与结构连续性惩罚,选择性扩展相邻片段。SCAR采用与每个检索片段自身查询相关性绑定的相对扩展阈值,形成近似尺度不变的决策规则,可在无需重新校准的情况下跨嵌入模型迁移。在四个不同语料库(RFC、GDPR、10-K报告和并购协议;共320个查询,其中160个存在边界碎片化)上的实验表明,SCAR针对边界碎片化查询实现了92.8%的召回率,仅需7.84个片段,较静态窗口化(10.16个片段)减少22.9%。配对bootstrap检验(B=10,000)证实片段缩减具有高度统计显著性(p<0.0001,Cohen's d=-1.49,大效应量),召回率差异较小(Cohen's d=-0.33)。该策略在三种嵌入模型(text-embedding-3-large、BGE-large-en-v1.5、zembed-1)上使用同一超参数设置即可迁移,下游RAGAS评估(基于10-K语料库)证实SCAR在减少27.1%上下文令牌的同时保持了生成忠实度。