Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. Previous research shows that using test-time learning to encode context directly into model parameters can effectively enable reasoning over noisy information. However, meta-learning methods for enabling test-time learning are prohibitively memory-intensive, preventing their application to long context settings. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long input contexts using gradient updates to a lightweight model adapter at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard prompt-based long-context baseline, achieving average absolute performance gains of up to 90% for smaller models (GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In general, PERK is more robust to reasoning complexity, length extrapolation, and the locations of relevant information in contexts. Finally, we show that while PERK is memory-intensive during training, it scales more efficiently at inference time than prompt-based long-context inference.
翻译:长上下文推理需要在广泛且嘈杂的输入上下文中准确识别相关信息。先前研究表明,使用测试时学习将上下文直接编码到模型参数中,可以有效实现对嘈杂信息的推理。然而,支持测试时学习的元学习方法内存消耗极大,阻碍了其在长上下文场景中的应用。本文提出PERK(基于知识的高效参数推理),一种可扩展的方法,通过在测试时对轻量级模型适配器进行梯度更新来学习编码长输入上下文。具体而言,PERK在元训练阶段采用两个嵌套的优化循环。内循环将上下文快速编码至一个低秩适配器(LoRA)中,该适配器作为基础模型的参数高效记忆模块。同时,外循环学习使用更新后的适配器,从编码的长上下文中准确回忆相关信息并进行推理。我们在多个长上下文推理任务上的评估表明,PERK显著优于基于提示的标准长上下文基线方法,在较小模型(GPT-2)上平均绝对性能提升高达90%,在我们评估的最大模型Qwen-2.5-0.5B上提升高达27%。总体而言,PERK对推理复杂性、长度外推以及相关信息在上下文中的位置具有更强的鲁棒性。最后,我们证明尽管PERK在训练阶段内存消耗较大,但在推理时的扩展效率优于基于提示的长上下文推理方法。