Retrieval-Augmented Generation (RAG) systems often rely on fixed top-k document selection mechanisms that ignore downstream generation quality and impose computational overheads. We propose SRAS (Sparse Reward-Aware Selector), a lightweight document selector trained via reinforcement learning (RL) for edge-native RAG deployment. Unlike prior RL-based retrievers that assume large memory and latency budgets, SRAS learns a compact (~0.76MB) policy using Proximal Policy Optimization (PPO), guided by a hybrid reward signal combining Relaxed F1 and BERTScore. Our method operates under tight token and compute constraints, maintaining <1s latency on CPU. SRAS outperforms supervised and random selectors on a synthetic QA benchmark, and generalizes to real-world data, achieving BERTScore F1 of 0.8546 on SQuAD v2 without domain-specific tuning. This work is the first to demonstrate that RL-based document selection can be made ultra-lightweight, latency-aware, and effective for on-device RAG pipelines.
翻译:检索增强生成(RAG)系统通常依赖固定的 top-k 文档选择机制,该机制忽略了下游生成质量并带来了计算开销。我们提出 SRAS(稀疏奖励感知选择器),一种通过强化学习(RL)训练的轻量级文档选择器,专为边缘原生 RAG 部署设计。与先前假设较大内存和延迟预算的基于 RL 的检索器不同,SRAS 使用近端策略优化(PPO)学习一个紧凑(约 0.76MB)的策略,其训练由结合了松弛 F1 和 BERTScore 的混合奖励信号引导。我们的方法在严格的令牌和计算约束下运行,在 CPU 上保持 <1 秒的延迟。SRAS 在一个合成 QA 基准测试中优于监督式和随机选择器,并能泛化到真实世界数据,在未经领域特定调优的情况下,在 SQuAD v2 数据集上实现了 0.8546 的 BERTScore F1。本工作首次证明,基于强化学习的文档选择可以实现超轻量化、延迟感知,并能有效用于设备端 RAG 管道。