Probing the memorization of large language models holds significant importance. Previous works have established metrics for quantifying memorization, explored various influencing factors, such as data duplication, model size, and prompt length, and evaluated memorization by comparing model outputs with training corpora. However, the training corpora are of enormous scale and its pre-processing is time-consuming. To explore memorization without accessing training data, we propose a novel approach, named ROME, wherein memorization is explored by comparing disparities across memorized and non-memorized. Specifically, models firstly categorize the selected samples into memorized and non-memorized groups, and then comparing the demonstrations in the two groups from the insights of text, probability, and hidden state. Experimental findings show the disparities in factors including word length, part-of-speech, word frequency, mean and variance, just to name a few.
翻译:摘要:探究大型语言模型的记忆机制具有重要意义。已有研究建立了量化记忆能力的指标,探索了数据重复、模型规模、提示长度等影响因素,并通过比较模型输出与训练语料来评估记忆能力。然而,训练语料规模庞大且预处理耗时。为在不访问训练数据的情况下探究记忆机制,我们提出一种名为ROME的新方法,通过对比记忆样本与非记忆样本的差异来探索记忆现象。具体而言,模型首先将选定样本划分为记忆组与非记忆组,随后从文本、概率和隐藏状态三个视角比较这两组样本的表现特征。实验结果表明,两组样本在词长、词性、词频、均值与方差等多个维度上存在显著差异。