Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.
翻译:记忆作为最核心的认知功能之一,构成了世界知识和活动事件的存储库。近年来,大规模预训练语言模型展现出惊人的记忆能力。相反,未经预训练的普通神经网络长期受困于灾难性遗忘问题。为探究这种记忆与遗忘的矛盾现象,理解语言模型的记忆机制,我们通过控制目标知识类型、学习策略和学习进度开展了系统的实验研究。研究发现:1) 普通语言模型存在遗忘特性;2) 预训练能赋予语言模型保持记忆的能力;3) 知识相关性和多样性显著影响记忆的形成。这些结论有助于理解预训练语言模型的能力,并为设计和评估语言模型的新型学习与推理算法提供了启示。