Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.
翻译:当前的大型语言模型(LLMs)在简单事实检索任务上往往表现不佳。本文研究了将动态可适应的外部记忆与LLM耦合是否能缓解这一问题。为此,我们在长上下文回忆任务(包括密码和"大海捞针"测试)上测试了Larimar——一种最近提出的使用外部关联记忆的语言模型架构。我们证明,Larimar的外部记忆(允许快速写入和读取文本样本片段)可在测试时用于处理远超过训练所见长度的上下文。我们进一步表明,从记忆(长上下文被写入其中)的潜在读取控制着解码器生成正确输出,而记忆存储在GPU之外。与现有基于Transformer的、用于长上下文回忆任务的LLM架构(这些架构使用更多参数或改进的注意力机制)相比,相对较小规模的Larimar能够在无需任何任务特定训练或更长上下文训练的情况下保持强劲性能。