With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty of this task, most of the existing methods are proof-of-concept and still not effective enough. In this paper, we investigate and benchmark tricks for improving training data extraction using a publicly available dataset. Because most existing extraction methods use a pipeline of generating-then-ranking, i.e., generating text candidates as potential training data and then ranking them based on specific criteria, our research focuses on the tricks for both text generation (e.g., sampling strategy) and text ranking (e.g., token-level criteria). The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction. Based on the GPT-Neo 1.3B evaluation results, our proposed tricks outperform the baseline by a large margin in most cases, providing a much stronger baseline for future research.
翻译:随着语言模型的发展,隐私保护日益受到关注。训练数据提取作为评估隐私泄露的潜在工具,具有重要意义。然而,由于该任务具有较大难度,现有方法大多为概念验证性质,效果仍不够理想。本文基于公开数据集,系统研究并基准测试了改进训练数据提取的各类技巧。由于现有提取方法多采用"生成-排序"流水线(即先生成文本候选项作为潜在训练数据,再依据特定标准对其排序),本研究聚焦于文本生成(如采样策略)和文本排序(如令牌级标准)两个环节的技巧。实验结果表明,若干先前被忽视的技巧对训练数据提取的成功至关重要。基于GPT-Neo 1.3B的评估结果,我们提出的技巧集合在大多数场景下显著优于基线方法,为后续研究提供了更强的基准。