Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.
翻译:众多研究已强调预训练大型语言模型相关的隐私风险。相比之下,我们的研究提供了一个独特视角,证明预训练大型语言模型能有效促进隐私保护。我们提出一种名为DP-Prompt的局部差分隐私机制,该机制利用预训练大型语言模型和零样本提示的能力来抵御作者去匿名化攻击,同时最小化对下游效用的影响。当DP-Prompt与ChatGPT(gpt-3.5)等强大语言模型结合使用时,我们观察到去匿名化攻击的成功率显著降低,尽管设计更为简单,但效果远超现有方法。例如,在IMDB数据集上,DP-Prompt(使用ChatGPT)完美恢复了情感F1分数,同时针对静态攻击者的作者识别F1分数降低了46%,针对自适应攻击者则降低了26%。我们通过六种参数量最高达70亿的开源大型语言模型进行了广泛实验,分析了隐私-效用权衡的多方面影响。