Probing the memorization of large language models holds significant importance. Previous works have established metrics for quantifying memorization, explored various influencing factors, such as data duplication, model size, and prompt length, and evaluated memorization by comparing model outputs with training corpora. However, the training corpora are of enormous scale and its pre-processing is time-consuming. To explore memorization without accessing training data, we propose a novel approach, named ROME, wherein memorization is explored by comparing disparities across memorized and non-memorized. Specifically, models firstly categorize the selected samples into memorized and non-memorized groups, and then comparing the demonstrations in the two groups from the insights of text, probability, and hidden state. Experimental findings show the disparities in factors including word length, part-of-speech, word frequency, mean and variance, just to name a few.
翻译:探测大型语言模型中的记忆现象具有重要研究意义。现有工作已建立量化记忆的指标体系,探索数据重复、模型规模、提示长度等多重影响因素,并通过比对模型输出与训练语料评估记忆程度。然而,训练语料规模庞大且预处理耗时,为在不接触训练数据的前提下研究记忆现象,我们提出名为ROME的新方法,通过对比记忆样本与非记忆样本的差异来探索记忆机制。具体而言,模型首先将选定样本划分为记忆组与非记忆组,随后从文本特征、概率分布和隐藏状态三个维度分析两组样本的差异性。实验结果表明,两组样本在词长、词性、词频、均值与方差等多个因素上均存在显著差异。