Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated confidence estimation method, which normalizes the confidence of the generated suffixes with a local estimation. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark. We also investigate several factors influencing the data extraction performance, including decoding strategy, model scale, prefix length, and suffix length. Our code is available at https://github.com/thu-coai/Targeted-Data-Extraction.
翻译:大型预训练语言模型在多项任务中取得了显著成果。然而,近期研究指出,预训练语言模型可能记忆了大量训练数据,导致信息泄露的隐私风险。本文提出一种名为Ethicist的方法,通过损失平滑软提示和校准置信度估计实现定向训练数据提取,研究在给定前缀时如何恢复训练数据中的后缀。为激发被攻击模型中的记忆效应,我们在保持模型参数固定的前提下,对软提示嵌入进行调优。进一步提出平滑损失函数,通过平滑后缀令牌的损失分布,使正确后缀的采样更易实现。为从采样得到的后缀集合中筛选最可能的后缀并评估预测置信度,我们提出校准置信度估计方法,利用局部估计对生成后缀的置信度进行归一化处理。实验表明,Ethicist在近期提出的公开基准测试中显著提升了提取性能。我们还探讨了解码策略、模型规模、前缀长度和后缀长度等多种影响数据提取性能的因素。代码开源地址为https://github.com/thu-coai/Targeted-Data-Extraction。