The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose privacy-preserving language models (PPLMs) with differential privacy (DP). Unfortunately, different DP implementations make it challenging for a fair comparison among existing PPLMs. In this paper, we present PrivLM-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage. PrivLM-Bench first clearly defines multi-faceted privacy objectives. Then, PrivLM-Bench constructs a unified pipeline to perform private fine-tuning. Lastly, PrivLM-Bench performs existing privacy attacks on LMs with pre-defined privacy objectives as the empirical evaluation results. The empirical attack results are used to fairly and intuitively evaluate the privacy leakage of various PPLMs. We conduct extensive experiments on three datasets of GLUE for mainstream LMs.
翻译:语言模型(LMs)的快速发展为模型和用户带来了前所未有的可访问性与使用便利。一方面,强大的语言模型在众多下游自然语言处理任务中实现了最先进的性能;另一方面,无限制的模型访问可能引发数据泄露的恶意隐私风险,这一问题正受到越来越多的关注。为解决这些隐患,近期许多研究提出了基于差分隐私(DP)的隐私保护语言模型(PPLMs)。然而,不同的差分隐私实现方式使得现有PPLMs之间难以进行公平比较。本文提出PrivLM-Bench——一个多视角隐私评估基准,旨在通过实证方法直观量化语言模型的隐私泄露程度。该基准不仅报告差分隐私参数,更聚焦于实际使用过程中常被忽视的推理数据隐私问题。PrivLM-Bench首先明确定义了多维度的隐私保护目标,继而构建统一的隐私微调流程,最后基于预设的隐私目标对语言模型实施现有隐私攻击,并将攻击结果作为实证评估依据。通过实证攻击结果,我们能够公平且直观地评估各类PPLMs的隐私泄露情况。我们在GLUE的三个数据集上对主流语言模型进行了大量实验验证。