Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-augmented LM outperforms much larger LMs that have access to the full input history. We also test MemoryPrompt on a long-distance dialogue dataset, where its performance is comparable to that of a model conditioned on the entire conversation history. In both experiments we also observe that, unlike full-finetuning approaches, MemoryPrompt does not suffer from catastrophic forgetting when adapted to new tasks, thus not disrupting the generalist capabilities of the underlying LM.
翻译:基于Transformer的语言模型(LMs)通过大型硬编码输入窗口追踪上下文信息。我们提出MemoryPrompt,一种更轻量的方法:通过一个小型辅助循环网络为LM补充信息,该网络将常规输入前缀化为一组向量序列(类似于软提示),无需对LM进行微调。在专为测试LM跟踪多事实更新能力设计的任务中,经MemoryPrompt增强的LM性能优于可访问完整输入历史的大型LM。我们还在一组长距离对话数据集上测试了MemoryPrompt,其表现与基于完整对话历史条件化的模型相当。在两个实验中,我们同样观察到,与完全微调方法不同,MemoryPrompt在适应新任务时不会遭受灾难性遗忘,从而不破坏底层LM的通用能力。