The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understood. This study examines susceptibility to data leakage by quantifying the phenomenon of memorization in machine learning models, focusing on the evolution of memorization patterns over training. We investigate how the statistical characteristics of training data influence the memories encoded within the model by evaluating how repetition influences memorization. We reproduce findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. Furthermore, we find that sequences which are not apparently memorized after the first encounter can be uncovered throughout the course of training even without subsequent encounters. The presence of these latent memorized sequences presents a challenge for data privacy since they may be hidden at the final checkpoint of the model. To this end, we develop a diagnostic test for uncovering these latent memorized sequences by considering their cross entropy loss.
翻译:大型语言模型的广泛应用已彻底变革了自然语言处理任务,但也引发了关于数据隐私与安全的深刻担忧。语言模型在包含潜在敏感或专有信息的大规模语料库上进行训练,而数据泄露的风险——即模型响应揭示此类信息片段的现象——仍未得到充分理解。本研究通过量化机器学习模型中的记忆现象来考察数据泄露的敏感性,重点关注训练过程中记忆模式的演变。我们通过评估重复性如何影响记忆,探究训练数据的统计特征如何影响模型编码的记忆。我们复现了以下发现:记忆序列的概率与其在数据中出现的次数呈对数关系。此外,我们发现首次接触后未明显记忆的序列,即使后续不再出现,也能在训练过程中被逐渐揭示。这些潜在记忆序列的存在对数据隐私构成了挑战,因为它们可能隐藏于模型的最终检查点中。为此,我们开发了一种通过分析交叉熵损失来检测这些潜在记忆序列的诊断方法。