Retrieval-augmented generation (RAG) has become a dominant paradigm for grounding large language models (LLMs) with external evidence in knowledge-intensive question answering. A core design choice is how to fuse retrieved samples into the LLMs, where existing internal fusion approaches broadly fall into query-based fusion, parametric fusion, and latent-based fusion. Despite their effectiveness at modest retrieval scales, these methods often fail to scale gracefully as the number of retrieved candidates k increases: Larger k improves evidence coverage, yet realistic top-k retrieval inevitably contains irrelevant or redundant content and increases the inference cost. To address these limitations, we propose ReFilter, a novel latent-based fusion framework that performs token-level filtering and fusion. ReFilter consists of three key components: a context encoder for encoding context features, a gated filter for weighting each token, and a token fusion module for integrating the weighted token feature into the LLM's hidden states. Our experiments across four general-domain QA benchmarks show that ReFilter consistently achieves the best average performance under both in-domain adaptation and out-of-domain transfer. ReFilter further generalizes to five biomedical QA benchmarks in zero-shot transfer without domain fine-tuning, reaching 70.01% average accuracy with Qwen2.5-14B-Instruct.
翻译:检索增强生成(RAG)已成为在知识密集型问答任务中为大型语言模型(LLM)提供外部证据的主流范式。其核心设计选择在于如何将检索到的样本融合到LLM中,现有的内部融合方法大致可分为基于查询的融合、参数化融合和基于潜在表示的融合。尽管这些方法在适度检索规模下有效,但随着检索候选数量k的增加,它们往往难以优雅地扩展:更大的k能提升证据覆盖率,但现实中的top-k检索不可避免地包含不相关或冗余内容,并增加推理成本。为应对这些局限性,我们提出了ReFilter,一种新颖的基于潜在表示的融合框架,它执行令牌级过滤与融合。ReFilter包含三个关键组件:用于编码上下文特征的上下文编码器、用于加权每个令牌的门控滤波器,以及用于将加权令牌特征集成到LLM隐藏状态中的令牌融合模块。我们在四个通用领域问答基准上的实验表明,ReFilter在领域内适应和领域外迁移两种设置下均能稳定取得最佳平均性能。ReFilter进一步在无需领域微调的零样本迁移中泛化至五个生物医学问答基准,使用Qwen2.5-14B-Instruct模型达到了70.01%的平均准确率。