Understanding whether and to what extent large language models (LLMs) have memorised training data has important implications for the reliability of their output and the privacy of their training data. In order to cleanly measure and disentangle memorisation from other phenomena (e.g. in-context learning), we create an experimental framework that is based on repeatedly exposing LLMs to random strings. Our framework allows us to better understand the dynamics, i.e., the behaviour of the model, when repeatedly exposing it to random strings. Using our framework, we make several striking observations: (a) we find consistent phases of the dynamics across families of models (Pythia, Phi and Llama2), (b) we identify factors that make some strings easier to memorise than others, and (c) we identify the role of local prefixes and global context in memorisation. We also show that sequential exposition to different random strings has a significant effect on memorisation. Our results, often surprising, have significant downstream implications in the study and usage of LLMs.
翻译:理解大型语言模型(LLMs)是否以及在多大程度上记忆了训练数据,对其输出的可靠性及训练数据的隐私保护具有重要影响。为清晰测量记忆现象并将其与其他机制(如上下文学习)分离,我们建立了一个基于重复向LLMs输入随机字符串的实验框架。该框架使我们能够更好地理解模型在重复接收随机字符串时的动态行为特征。通过该框架,我们获得了若干重要发现:(a)在不同模型系列(Pythia、Phi和Llama2)中观察到一致的动态阶段特征;(b)识别出导致某些字符串更易被记忆的影响因素;(c)揭示了局部前缀与全局上下文在记忆过程中扮演的角色。实验还表明,连续输入不同随机字符串会对记忆效果产生显著影响。这些常具反直觉性的研究结果,对LLMs的学术研究与实践应用具有重要的延伸意义。