The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.
翻译:将广泛且动态的知识整合到大型语言模型(LLMs)中仍然是一个重大挑战,这源于事实数据与推理模式之间固有的纠缠。现有解决方案,从非参数化的检索增强生成(RAG)到参数化的知识编辑,在实践中常常受到有限上下文窗口、检索器噪声或灾难性遗忘风险的制约。本文提出DRIFT,一种新颖的双模型架构,旨在显式地将知识提取与推理过程解耦。与静态提示压缩不同,DRIFT采用一个轻量级知识模型,根据查询动态地将文档块压缩为隐式事实标记。这些密集表示被投影到推理模型的嵌入空间中,替代原始冗余文本,同时保持推理准确性。大量实验表明,DRIFT在长上下文任务上显著提升了性能,优于同等规模模型中的强基线方法。我们的方法为扩展LLMs的有效上下文窗口和推理能力提供了一个可扩展且高效的范式。代码发布于 https://github.com/Lancelot-Xie/DRIFT。