Training data memorization in language models impacts model capability (generalization) and safety (privacy risk). This paper focuses on analyzing prompts' impact on detecting the memorization of 6 masked language model-based named entity recognition models. Specifically, we employ a diverse set of 400 automatically generated prompts, and a pairwise dataset where each pair consists of one person's name from the training set and another name out of the set. A prompt completed with a person's name serves as input for getting the model's confidence in predicting this name. Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set. We show that the performance of different prompts varies by as much as 16 percentage points on the same model, and prompt engineering further increases the gap. Moreover, our experiments demonstrate that prompt performance is model-dependent but does generalize across different name sets. A comprehensive analysis indicates how prompt performance is influenced by prompt properties, contained tokens, and the model's self-attention weights on the prompt.
翻译:语言模型对训练数据的记忆能力既影响模型性能(泛化能力)又关乎安全性(隐私风险)。本文聚焦于分析提示词对检测六种基于掩码语言模型的命名实体识别模型记忆效应的影响。具体而言,我们采用400组自动生成的多样化提示词,并构建成对数据集(每组包含训练集中的人名与非训练集中的人名)。将包含人名的提示词作为模型输入,获取模型对该人名的预测置信度。最终通过计算模型对训练集内人名的置信度高于训练集外人名的比例,量化提示词检测模型记忆效应的性能。实验表明,同一模型上不同提示词的性能差异可达16个百分点,而提示词工程进一步扩大该差距。此外,实验证明提示词性能具有模型依赖性,但可跨不同人名集泛化。综合分析揭示了提示词属性、所含标记及模型对提示词的自注意力权重如何影响其检测性能。