With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty of this task, most of the existing methods are proof-of-concept and still not effective enough. In this paper, we investigate and benchmark tricks for improving training data extraction using a publicly available dataset. Because most existing extraction methods use a pipeline of generating-then-ranking, i.e., generating text candidates as potential training data and then ranking them based on specific criteria, our research focuses on the tricks for both text generation (e.g., sampling strategy) and text ranking (e.g., token-level criteria). The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction. Based on the GPT-Neo 1.3B evaluation results, our proposed tricks outperform the baseline by a large margin in most cases, providing a much stronger baseline for future research. The code is available at https://github.com/weichen-yu/LM-Extraction.
翻译:随着语言模型的发展,隐私保护受到越来越多的关注。训练数据提取作为评估隐私泄露的潜在工具具有重要意义。然而,由于该任务的难度,现有方法大多为概念验证性质,且效果仍不够理想。本文基于公开数据集,系统研究并比较了改进训练数据提取技术的技巧。由于现有提取方法大多采用"生成-排序"流水线(即先生成候选文本作为潜在训练数据,再根据特定标准对其进行排序),我们的研究聚焦于文本生成(如采样策略)和文本排序(如词元级标准)两方面的技巧。实验结果表明,若干此前被忽视的技巧对训练数据提取的成功至关重要。基于GPT-Neo 1.3B模型的评估显示,我们提出的技巧在多数情况下以显著优势超越基线方法,为后续研究提供了更强大的基线参考。代码已开源至https://github.com/weichen-yu/LM-Extraction。