The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.
翻译:当前关于自然语言模型(尤其是大型语言模型)记忆现象的研究揭示了严重的安全与隐私风险,这些模型倾向于从训练数据中记忆个人身份信息。本文提出随机掩码微调,这是一种新颖的隐私保护微调技术,能够在最小化性能影响的同时有效降低PII记忆程度。基于安然电子邮件数据集的实验表明,相较于基线微调方法,RMFT实现了80.81%的总提取率降低和80.17%的已见提取率降低,其效果优于去重方法,且困惑度仅增加5.73%。我们提出MaxTER——一个用于评估隐私-效用权衡的帕累托最优评估框架,并通过响应曲线下面积指标展示了RMFT与去重方法的性能对比。