The training of modern large language models (LLMs) takes place in a regime where most training examples are seen only a few times by the model during the course of training. What does a model remember about such examples seen only a few times during training and how long does that memory persist in the face of continuous training with new examples? Here, we investigate these questions through simple recognition, recall, and retention experiments with LLMs. In recognition experiments, we ask if the model can distinguish the seen example from a novel example; in recall experiments, we ask if the model can correctly recall the seen example when cued by a part of it; and in retention experiments, we periodically probe the model's memory for the original examples as the model is trained continuously with new examples. We find that a single exposure is generally sufficient for a model to achieve near perfect accuracy even in very challenging recognition experiments. We estimate that the recognition performance of even small language models easily exceeds human recognition performance reported in similar experiments with humans (Shepard, 1967). Achieving near perfect recall takes more exposures, but most models can do it in just 3 exposures. The flip side of this remarkable capacity for fast learning is that precise memories are quickly overwritten: recall performance for the original examples drops steeply over the first 10 training updates with new examples, followed by a more gradual decline. Even after 100K updates, however, some of the original examples are still recalled near perfectly. A qualitatively similar retention pattern has been observed in human long-term memory retention studies before (Bahrick, 1984). Finally, recognition is much more robust to interference than recall and memory for natural language sentences is generally superior to memory for stimuli without structure.
翻译:现代大型语言模型(LLMs)的训练处于这样一个阶段:在训练过程中,大多数训练样本仅被模型见过少数几次。对于这些在训练中仅被见过几次的样本,模型能记住什么?以及在持续用新样本进行训练的过程中,这种记忆能持续多久?本文通过简单的识别、回忆和保留实验来探究这些问题。在识别实验中,我们测试模型能否区分见过的样本与新样本;在回忆实验中,我们测试当模型被部分线索提示时,能否正确回忆出见过的样本;在保留实验中,我们随着模型不断接受新样本训练,定期探测其对原始样本的记忆。我们发现,即使是在极具挑战性的识别实验中,单次暴露通常也足以让模型达到近乎完美的准确率。我们估计,即便是小型语言模型,其识别性能也轻松超越了人类在类似实验中的识别性能(Shepard, 1967)。达到近乎完美的回忆需要更多次暴露,但大多数模型仅需3次暴露即可做到。这种快速学习能力的另一面是,精确的记忆很快会被覆盖:在对新样本进行前10次训练更新后,对原始样本的回忆性能急剧下降,随后转为更平缓的衰退。然而,即使经过10万次更新,部分原始样本仍能被近乎完美地回忆出来。这种定性上相似的保留模式在先前的人类长期记忆保持研究中已有观察到(Bahrick, 1984)。最后,识别比回忆更能抵抗干扰,且对自然语言句子的记忆通常优于对无结构刺激的记忆。