The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.
翻译:当前关于自然语言模型(特别是大型语言模型)记忆行为的研究揭示了严重的安全与隐私风险,因为模型倾向于从训练数据中记忆个人身份信息。本文提出随机掩码微调——一种新颖的隐私保护微调技术,该技术能在最小化性能影响的前提下有效降低PII记忆。基于安然电子邮件数据集的实验表明,与基线微调相比,RMFT实现了总提取率降低80.81%、已见提取率降低80.17%,其效果优于去重方法,且困惑度仅增加5.73%。我们提出MaxTER这一帕累托最优评估框架用于量化隐私-效用权衡,并通过响应曲线下面积指标展示了RMFT与去重方法的性能对比。