Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for LLMs, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target overlap cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in LLMs.
翻译:据报道,大型语言模型(LLMs)存在"泄露"个人可识别信息(PII)的现象,而成功的PII重建常被解读为记忆的证据。我们提出了一种原则性的LLM记忆评估修正方案,主张PII泄露应在低词汇线索条件下进行评估,即目标PII无法通过提示诱导的泛化或模式补全来重建。我们将线索抵抗记忆(CRM)形式化为一个线索控制的评估框架和有效记忆评估的必要条件,明确以提示-目标重叠线索为条件。运用CRM框架,我们对32种语言和多种记忆范式下的PII泄露进行了大规模多语言重评估。通过重新审视基于重建的设定(包括逐字前缀-后缀补全和关联重建),我们发现其表面有效性主要源于直接的表层形式线索而非真实记忆。当控制此类线索时,重建成功率显著下降。我们进一步考察了无线索生成和成员推理两种范式,两者均表现出极低的真阳性率。总体而言,我们的研究结果表明,先前报道的PII泄露更宜解释为线索驱动行为而非真实记忆,这凸显了线索控制评估对于可靠量化LLMs中隐私相关记忆的重要性。